Page 1

The relationship between optimal and biologically

plausible decoding of stimulus velocity

in the retina

Edmund C. Lalor,1,* Yashar Ahmadian,2and Liam Paninski2

1Trinity Centre for Bioengineering and Institute of Neuroscience, Trinity College Dublin,

College Green, Dublin 2, Ireland

2Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, New York 10027, USA

* Corresponding author: edlalor@tcd.ie

Received January 30, 2009; revised June 14, 2009; accepted July 23, 2009;

posted August 7, 2009 (Doc. ID 106996); published September 11, 2009

A major open problem in systems neuroscience is to understand the relationship between behavior and the

detailed spiking properties of neural populations. We assess how faithfully velocity information can be decoded

from a population of spiking model retinal neurons whose spatiotemporal receptive fields and ensemble spike

train dynamics are closely matched to real data. We describe how to compute the optimal Bayesian estimate of

image velocity given the population spike train response and show that, in the case of global translation of an

image with known intensity profile, on average the spike train ensemble signals speed with a fractional stan-

dard deviation of about 2% across a specific set of stimulus conditions. We further show how to compute the

Bayesian velocity estimate in the case where we only have some a priori information about the (naturalistic)

spatial correlation structure of the image but do not know the image explicitly. As expected, the performance of

the Bayesian decoder is shown to be less accurate with decreasing prior image information. There turns out to

be a close mathematical connection between a biologically plausible “motion energy” method for decoding the

velocity and the Bayesian decoder in the case that the image is not known. Simulations using the motion en-

ergy method and the Bayesian decoder with unknown image reveal that they result in fractional standard

deviations of 10% and 6%, respectively, across the same set of stimulus conditions. Estimation performance is

rather insensitive to the details of the precise receptive field location, correlated activity between cells, and

spike timing. © 2009 Optical Society of America

OCIS codes: 330.4060, 330.4150.

1. INTRODUCTION

The question of how different attributes of a visual stimu-

lus are represented by populations of cells in the retina

has been addressed in a number of recent studies [1–8].

This field has received a major boost with the advent of

methods for obtaining large-scale simultaneous record-

ings from multiple retinal ganglion neurons that almost

completely tile a substantial region of the visual field

[9,10]. The utility of this new method for understanding

the encoding of behaviorally relevant signals was exem-

plified by [4], where the authors examined the question of

how reliably visual motion was encoded in the spiking ac-

tivity of a population of macaque parasol cells. These au-

thors used a simple moving stimulus and attempted to es-

timate the velocity of that stimulus from the resulting

spike train ensemble; this analysis pointed to some im-

portant constraints on the visual system’s ability to de-

code image velocity given noisy spike train responses. We

will explore these issues in more depth in this paper.

In parallel to these advances in retinal recording tech-

nology, significant recent advances have also been made

in our ability to model the statistical properties of popu-

lations of spiking neurons. For example, a statistical

model of a complete population of primate parasol retinal

ganglion cells (RGCs) was recently described [7]. This

model was fit using data acquired by the array recording

techniques mentioned above and includes spike-history

effects and cross-coupling between cells of the same kind

and of different kinds (i.e., ON and OFF cells). The au-

thors demonstrated that the model accurately captures

the stimulus dependence and spatiotemporal correlation

structure of RGC population responses, and allows sev-

eral insights to be made into the retinal neural code. One

such insight concerns the role of correlated activity in pre-

serving sensory information. Using pseudorandom binary

stimuli and Bayesian inference, they reported that stimu-

lus decoding based on the spiking output of the model pre-

served 20% more information when knowledge of the cor-

relation structure was used than when the responses

were considered independently [7].

At the psychophysical level, Bayesian inference has

been established as an effective framework for under-

standing visual perception [11]; some recent notable ap-

plications to understanding visual velocity processing in-

clude [12–17]. In particular, [14] argued that a number of

visual illusions actually arise naturally in a system that

attempts to estimate local image velocity via Bayesian

methods (though see also [18,19]).

Links between retinal coding and psychophysical be-

havior have also been recently examined using Bayesian

methods; [20,21], for example, examine the contribution

of turtle RGC responses to velocity and acceleration en-

coding. This study reported that the instantaneous firing

rates of individual turtle RGCs contain information about

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB25

1084-7529/09/110B25-18/$15.00© 2009 Optical Society of America

Page 2

speed, direction, and acceleration of moving patterns. The

firing-rate-based Bayesian stimulus reconstruction car-

ried out in that study involved a couple of key approxima-

tions. These included the assumptions that RGCs gener-

ate spikes according to Poisson statistics and that they do

so independently of each other. The work of [7] empha-

sizes that these assumptions are unrealistic, but the im-

pact of detailed spike timing and correlation information

on velocity decoding remains uncertain.

The primary goal of this paper is to investigate the fi-

delity with which the velocity of a visual stimulus may be

estimated, given the detailed spiking responses of the pri-

mate RGC population model of [7], using Bayesian decod-

ers, with and without full prior knowledge of the image.

We begin by describing the mathematical construction of

the Bayesian decoders, and then compare these estimates

to those based on a biologically plausible “net motion sig-

nal” derived directly from the spike trains without any

prior image information [4]. We derive a mathematical

connection between these decoders and investigate the

decoders’ performance through a series of simulations.

2. METHODS

A. Model

The generalized linear model (GLM) [22,23] for the spik-

ing responses of the sensory network used in this study

was described in detail in [7]. It consists of an array of ON

and OFF retinal ganglion cells (RGC) with specific base-

line firing rates. Given the spatiotemporal image movie

sequence, the model generates a mean firing rate for each

cell, taking into account the temporal dynamics and the

center-surround spatial stimulus filtering properties of

the cells. Then, incorporating spike history effects and

cross-coupling between cells of the same type and of the

opposite type, it generates spikes for each cell as a sto-

chastic point process.

In response to the visual stimulus I, the ith cell in the

observed population emits a spike train, which we repre-

sent by a response function

ri?t? =?

?

??t − ti,??,

?1?

where each spike is represented by a delta function, and

ti,?is the time of the ?th spike of the ith neuron. We use

the shorthand notation riand r for the response function

of one neuron and the collective spike train responses of

all neurons, respectively. The stimulus I represents the

spatiotemporal luminance profile I?n,t? of a movie as a

function of the pixel position n and time t.

In the GLM framework, the intensity functions (instan-

taneous firing rate) of the responses riare given by

[7,24–26]

?i?t? ? f?bi+ Ji?t? +?

j,?

hij?t − tj,???,

?2?

where f?·? is a positive, strictly increasing rectifying func-

tion. As in [7], we adopt the choice f?·?=exp?·?. The birep-

resents the log of the baseline firing rate of the cell, the

coupling terms hijmodel the within- and between-neuron

spike history effects noted above, and the stimulus input

Ji?t? is obtained from I by linearly filtering the spatiotem-

poral luminance,

Ji?t? =??ki?t − ?,n?I??,n?d2nd?,

?3?

where ki?t,n? is the spatiotemporal receptive field of the

cell i. The parameters for each cell were fit using 7 min of

spiking data recorded during the presentation of a nonre-

peating stimulus, with the baseline log firing rate being a

constant and the various filter parameters being fit using

a basis of raised cosine “bumps” [7]. Given Eq. (2), we can

write down the point process log-likelihood in the stan-

dard way [27]

log p?r?I? ??

i,?

log ??ti,?? −?

i?

0

T

?i?t?dt.

?4?

For movies arising from images rigidly moving with

constant velocity v we have

I?t,n? = x?n − vt?,

?5?

where x?n? is the luminance profile of a fixed image. Sub-

stituting Eq. (5) into Eq. (3) and shifting the integration

variable n by v?, we obtain

Ji?t? =?Ki,v?t;n?x?n?d2n,

?6?

where we defined

Ki,v?t;n? ??ki?t − ?,n + v??d?.

?7?

In the following we replace p?r?I? with its equivalent

p?r?x,v? [since, via Eq. (5), I is given in terms of x and v]

and use the short-hand matrix notation Ji=Ki,v·x for Eq.

(6). An important point is that in the case of a convex and

log-concave GLM nonlinearity, f?·? [conditions that are

true for our choice, f?·?=exp?·?], the GLM log-likelihood,

Eq. (4), is a concave function of x?n?.

B. Decoding

In order to estimate the speed of the moving bar given the

simulated output spike trains r of our RGC population,

we employed three distinct methods. The first method in-

volved a Bayesian decoder with full image information,

the second method utilized a Bayesian decoder with less

than full image information, while the third method in-

volved an “energy-based” algorithm introduced by [4] that

used no explicit prior knowledge of the image. For reasons

that will become clear, these decoders will be hereafter

known as the optimal decoder, the marginal decoder, and

the energy method, respectively. Given a simulated out-

put spike train ensemble, we use each of these methods to

estimate the speed of the stimulus that evoked the en-

semble by maximizing some function across a range of

possible or “putative” speeds.

1. Bayesian Velocity Estimation

To compute the optimal Bayesian velocity decoder we

need to evaluate the posterior probability for the velocity

B26 J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 3

p?v?r? conditional on the observed spike trains r. Given a

prior distribution pv?v?, from Bayes’ rule we obtain

p?v?r? =

p?r?v?pv?v?

?v?p?r?v??pv?v??

.

?8?

If the image x (e.g., a narrow bar with a luminance dis-

tinct from the background) is known to the decoder, then

we can replace p?r?v? with the likelihood function

p?r?x,v?, obtaining

p?v?r,x? =

p?r?x,v?pv?v?

?v?p?r?x,v??pv?v??

.

?9?

p?r?x,v? is provided by the forward model Eq. (4), and

therefore computation of the posterior probability is

straightforward in this case.

Alternatively, if the image is not fully known, we rep-

resent the decoder’s uncertain a priori knowledge regard-

ing x with an image prior distribution px?x?. In this case,

p?r?v? is obtained by marginalization over x:

p?r?v? =?p?r,x?v?dx =?p?r?x,v?px?x?dx.

?10?

Hence, we will refer to p?r?v? as the marginal likelihood.

Given the marginal likelihood, Eq. (8) allows us to calcu-

late Bayesian estimates for general velocity priors. The

prior distribution px?x? which describes the statistics of

the image ensemble, can be chosen to have a naturalistic

correlation structure. In our simulations in Section 3 we

use a Gaussian image ensemble with power spectrum

matched to observations in natural images [28,29].

In general, the calculation of the high-dimensional in-

tegral over x in Eq. (10) is a difficult task. However, when

the integrand p?r,x?v? is sharply peaked around its

maximum [which is the maximum a posteriori (MAP) es-

timate for x—as the integrand is proportional to the pos-

terior image distribution p?x?r,v? by Bayes’ rule] the so-

called “Laplace” approximation (also known as the

“saddle-point” approximation) provides an accurate esti-

mate for this integral [for applications of this approxima-

tion in the Bayesian setting, see e.g., [30]]. The Laplace

approximation in the context of neural decoding is further

discussed in, e.g., [31–35]. We briefly review this approxi-

mation here.

Following [29], we consider Gaussian image priors with

zero mean and covariance Cxchosen to match the power

spectrum of natural images [28]. Let us define the func-

tion

L?x,r,v? ? log px?x? + log p?r?x,v? +

1

2log?2??d?Cx?,

?11?

where d represents the number of pixels in our simulated

image, and rewrite Eq. (10) as

??2??d?Cx??eL?x,r,v?dx.

Using Eq. (4) and px?x?=N?0,Cx?, we obtain the expres-

sion

p?r?v? =

1

?12?

L?x,r,v? = −

1

2xTCx

−??i?t;x,r?dt?,

−1x +?

i??

?

log ?i?ti,?;x,r?

?13?

where ?iare given by Eqs. (2), (6), and (7), and we made

their dependence on x and r manifest. Since both terms in

Eq. (13) are concave (see the closing remarks in Subsec-

tion 2.A), the log-posterior L?x,r,v? is concave in x. To ob-

tain the Laplace approximation, for fixed r, we first find

the value of x that maximizes L (i.e., the image MAP,

xMAP). When the integrand is sharply concentrated

around its maximum, we can Taylor expand L around

xMAPto the first nonvanishing order beyond the zeroth-

order (i.e., its maximum value) and neglect the rest of the

expansion. Since at the maximum the gradient of L and

hence the first-order term vanish, we obtain

L?x,r? ? L?xMAP,r,v? −

1

2?x − xMAP?TH?r,v??x − xMAP?,

?14?

where the negative Hessian matrix

H?r,v? ? ? − ?x?xL?x,r,v??x=xMAP,

is positive semidefinite due to the maximum condition.

Exponentiating this yields the Gaussian approximation

(up to normalization)

?15?

eL?x,r,v?? p?x?r,v? ? N?xMAP?r,v?,Cx?r,v??,

where N??,C? denotes a Gaussian density with mean ?

and covariance C for the integrand of Eq. (12). [An impor-

tant technical point here is that this Gaussian approxi-

mation is partially justified by the fact that the log-

posterior (13) is a concave function of x [24,26,34] and

therefore has a single global optimum, like the Gaussian

(16).] Here, the posterior image covariance Cx?r,v? is

given by the inverse of the negative Hessian matrix

H?r,v?. (Note the dependence on both the observed re-

sponses r and the putative velocity v.) The elementary

Gaussian integration in Eq. (12) then yields

?16?

p?r?v? ?

e−L?xMAP?r,v?,r,v?

??CxH?r,v??

?17?

for the marginal likelihood, or its logarithm

log p?r?v? ? − L?xMAP?r,v?,r,v? −

1

2log?CxH?r,v??.

?18?

The MAP itself is found from the condition ?xL=0, which

in the case of exponential GLM nonlinearity f?·?=exp?·?

yields the equation

xMAP?n;r,v? =?d2n?Cx?n,n???

i?Ki,v?t;n???ri?t?

− ?i?t;xMAP,r??dt.

?19?

Note that this equation is nonlinear due to the appear-

ance of xMAPinside the GLM nonlinearity on the right-

hand side. As mentioned above, the objective function Eq.

(11) is concave and can be efficiently optimized using

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB27

Page 4

gradient-based optimization algorithms, such as the

Newton–Raphson method. In particular, by exploiting the

quasi-locality of the GLM likelihood we can implement

the Newton–Raphson method such that, in cases where

the image x?n? depends only on one component of n, the

MAP can be found in a computational time scaling only

linearly with the spatial size of the image (see Appendix B

for further elaboration on this point). Once xMAPis found,

the Hessian at MAP and Eq. (17) can be calculated easily,

and using Eq. (17), the approximate computation of

p?r?v? is complete.

To recapitulate, in the case of an a priori uncertain im-

age, given the observed spike trains r, we numerically

find xMAP?r,v? for a range of putative velocities v and us-

ing Eq. (17), we compute p?r?v?, from which we may ob-

tain p?v?r? via Eq. (8). We then take the value of velocity

v?that maximizes p?v?r? as the estimate; i.e., we use the

MAP estimate for the velocity.

As discussed in Section 1, our goal here was to critically

examine the role of the detailed spiking structure of the

GLM in constraining our estimates of the velocity. Since

the spiking network model structure enters here only via

the likelihood term p?r?v?, we did not systematically ex-

amine the effect of strong a priori beliefs p?v? on the re-

sulting estimator (as discussed at further length, e.g., in

[14]). Instead we used a simple uniform prior on velocity,

which renders the MAP velocity estimate equivalent to

the maximum (marginal) likelihood estimate, i.e., the

value of v that maximizes p?r?v? given by the approxima-

tion Eq. (17) [or equivalently, its logarithm Eq. (18)].

Similarly, in the case of a priori known image x we chose

the velocity v that maximizes the likelihood p?r?x,v?.

2. Velocity Estimation Using the Energy Method

In order to assess the precision of our Bayesian estimates

of velocity, we compared our estimates to those obtained

using the correlation-based algorithm described in [4].

This algorithm closely resembles the spatiotemporal en-

ergy models for motion processing introduced by [36]. In

order to understand the rationale behind this method, as-

sume, hypothetically, that all the cells have exactly the

same receptive fields up to the positioning of their centers

and that they respond reliably and without noise to the

stimulus. Then the RGCs’ spike trains riin response to

moving images would clearly be identical up to time

translations. In other words, ri?t+ni/v? would be equal for

all i, where niis the center position of the ith cell’s recep-

tive field along the axis of motion, and v is the magnitude

of v. Thus even in the realistic, noisy situation, we expect

the rifor different i to have a large overlap if they are

shifted in time as described, and in principle, we should

be able to recover the true velocity by maximizing a

smoothed version of this overlap. Inspired by this obser-

vation, an energy function is constructed as follows. First,

the spike trains are convolved with a Gaussian filter

w?t??exp?−t2/2?2? (we chose ? to be 10 ms; see below and

[4]). Let us define

r ˜i?t? = w?ri=?

?

e−?t − ti,??2/2?2.

?20?

Then, the “energy” function for the entire population of

cells is determined by the sum of the overlaps of the

shifted and smoothed responses of all cells [37],

i,j?r ˜i?t +

=???

E?v,r? =?

ni

v?r ˜j?t +

v??

nj

v?dt

i

r ˜i?t +

ni

2

dt.

?21?

In order to cancel the effect of spontaneous activity of the

cells, in [4] a “net motion signal” N?v,r? is obtained by

subtracting energy of the left-shifted spike trains from

that of the right-shifted responses:

N?v,r? ? E?v,r? − E?− v,r?.

?22?

Finally, N?v,r? is calculated for v across a range of pu-

tative velocities, and the value that maximizes the net

motion signal is taken as the velocity estimate. Figure 1

illustrates the basic idea of this method for ON cells, al-

though it should be noted that OFF cells are also included

in our analysis.

3. Connection between the Bayesian and Energy-Based

Methods

An interesting connection can be drawn between Baye-

sian velocity decoding and the method of Subsection 2.B.2

based on the energy function Eq. (21). For simplicity,

Raw responses at 14.4°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 7.2°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 14.4°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 28.8°/s, ON cells

Time (sec)

Cell #

01

1

100

(A)

(B)

(C)

(D)

(E)

1 11

2

Fig. 1.

cell layout. Cells in the left most column are numbered 1–10 from

top to bottom, cells in the second column are numbered 11–20

from top to bottom, etc. (B) Raw responses from the ON cells for

a moving bar with speed 14.4°/s. Each tick represents one spike

and each row represents the response of a different cell. (C)–(E)

Same spike trains circularly shifted by an amount equal to the

time required for a stimulus with the indicated putative speed to

move from an arbitrary reference location to the receptive field

center. Responses from OFF cells were also included in this

procedure.

Ensemble motion signals. (A) Moving bar stimulus and

B28J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 5

imagine that spike trains are generated not by the GLM,

but rather by a simpler linear-Gaussian (LG) model. In

this case, it turns out that the marginal likelihood method

is closely related to the energy function method described

above. Specifically, we model the output spike trains as

ri= bi+ Ki,v· x + ?i,

?23?

where the noise term is Gaussian ?i?N?0,??. In the case

that the noise terms for different cells are independent,

we have pLG?r?x,v?=?iN?bi+Ki,v·x,??, though the gener-

alization to correlated outputs is straightforward. We

show in Appendix A that in a certain regime the loga-

rithm of the LG marginal likelihood is given by [see Eqs.

(A7) and (A8) and Eq. (A18)]

i,j?Ri?t +

log pLG?r?v? =

1

2?

ni

v?Rj?t +

nj

v?dt + A?v?,

?24?

where A?v? has no dependence on the observed spike

trains and only a weak dependence on v. We find empiri-

cally that the term A?v? in Eq. (24) grows with velocity,

and therefore its inclusion shifts value of the maximum

likelihood estimate toward higher velocities. Conversely,

its absence in the energy function Eq. (21) causes the en-

ergy method estimate to have a negative bias. See Fig. 5

for an illustration of this effect. The resemblance of the

remaining term to Eq. (21) above is clear. Here, Riare

smoothed versions of the spike trains ri(with the baseline

log firing rate subtracted out) and are given, as in Eq.

(20), by

Ri= wLG? ?ri− bi?,

?25?

where here the optimal smoothing filter wLGis deter-

mined by the receptive fields ki, the prior image correla-

tion statistics, and the velocity [its explicit form is given

in Eq. (A21) in Appendix A], as we discuss in more depth

below.

Thus maximizing the marginal likelihood Eq. (24) is, to

a good approximation, equivalent to maximizing the en-

ergy Eq. (21). The major difference between Eq. (21) and

Eq. (24) is in the filter we apply to the spike trains: r ˜ihas

been replaced by Ri. The key point is that Ridepends on

the stimulus filters ki, the velocity v, and the image prior

in an optimal manner, unlike the smoothing in Eq. (20).

(Note that, while changes in optimal filters at differing

light levels have been discussed in terms of motion esti-

mation in fly vision [38], no account of varying light levels

was taken here.) The dependence of this optimal filter as

a function of v can be explained fairly intuitively, as we

discuss at more length in Appendix A following Eq. (A21).

We find that ?w, the time scale of the smoothing filter wLG,

is dictated by three major time scales, some of which de-

pend on the velocity v: ?k, the width of the time window in

which each RGC integrates its input; lk/v, where lkis the

spatial width of the receptive field; and lcorr/v where lcorr

is the correlation length of natural images. At low veloci-

ties, lk/v and lcorr/v are large, and the smoothing time

scale ?wis also large, since in this case we gain more in-

formation about the underlying firing rates by averaging

over a longer time window. At high velocities, on the other

hand, ?kdominates lk/v and lcorr/v, and ?w??k. This set-

ting of ?wmakes sense because although the image movie

I can vary quite quickly here, the filtered input Ji?t? in-

duces a firing rate correlation time of order ?k, and exam-

ining the responses at a temporal resolution finer than ?k

only decreases the effective signal-to-noise.

Figure 2 illustrates these effects by plotting the opti-

mal smoothing filters wLGfor several different values of

the velocity v. Interestingly, in the high-velocity limit, the

analytically derived optimal temporal filter width ?wis of

the order of 10 ms, which was the value chosen empiri-

cally for the optimal Gaussian filter used in [4]. We re-

computed the optimal empirical filter for our simulated

data here by plotting the standard deviation of the veloc-

ity estimates obtained using the net motion signal [de-

fined in Eq. (22)] against the filter width (Fig. 3). For this

velocity ?28.8°/s? the optimal filter is of the order of

10 ms; thus, we used a filter of width 10 ms when com-

paring the energy method to the Bayesian decoder.

To summarize, maximizing the likelihood marginalized

over the unknown image is very closely related to maxi-

mizing the energy function introduced by [4], if we replace

the GLM with the simpler linear Gaussian model. Since

the actual spike train generation is much better modeled

by the GLM than by the Gaussian model, we expect Baye-

sian velocity estimation (even with uncertain prior knowl-

edge of the image) based on the correct GLM to be more

accurate. This expectation is borne out by our simula-

tions, though it is worth noting that the improvement is

significantly smaller than when the Bayesian decoder has

access to the exact image.

C. Simulations

We simulated the presentation of a bar moving across the

gray background of a CRT monitor refreshing at 120 Hz.

Fig. 2.

locities from 0.2°/s to 28.8°/s (bottom) in exponential steps. The

y axes are scaled in dimensionless units for clarity here. As dis-

cussed in Subsection 2.B.3, there are three time scales that de-

termine the time scale of our filter wLG. At low velocities, shown

in the upper panels, the width of w?t? is determined by the two

scales xk/v and xcorr/v and is thus quite large (since the denomi-

nator v is small). At the higher velocities shown in the lower pan-

els, the optimal filter width is dominated by the time scale of the

receptive field ?k, and is of the order of ?k, which is ?10–20 ms.

For even higher velocities the shape of this filter remains essen-

tially the same as in the bottom panel.

Optimal linear spike train filter wLGfor a range of ve-

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB29