Page 1

The relationship between optimal and biologically

plausible decoding of stimulus velocity

in the retina

Edmund C. Lalor,1,* Yashar Ahmadian,2and Liam Paninski2

1Trinity Centre for Bioengineering and Institute of Neuroscience, Trinity College Dublin,

College Green, Dublin 2, Ireland

2Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, New York 10027, USA

* Corresponding author: edlalor@tcd.ie

Received January 30, 2009; revised June 14, 2009; accepted July 23, 2009;

posted August 7, 2009 (Doc. ID 106996); published September 11, 2009

A major open problem in systems neuroscience is to understand the relationship between behavior and the

detailed spiking properties of neural populations. We assess how faithfully velocity information can be decoded

from a population of spiking model retinal neurons whose spatiotemporal receptive fields and ensemble spike

train dynamics are closely matched to real data. We describe how to compute the optimal Bayesian estimate of

image velocity given the population spike train response and show that, in the case of global translation of an

image with known intensity profile, on average the spike train ensemble signals speed with a fractional stan-

dard deviation of about 2% across a specific set of stimulus conditions. We further show how to compute the

Bayesian velocity estimate in the case where we only have some a priori information about the (naturalistic)

spatial correlation structure of the image but do not know the image explicitly. As expected, the performance of

the Bayesian decoder is shown to be less accurate with decreasing prior image information. There turns out to

be a close mathematical connection between a biologically plausible “motion energy” method for decoding the

velocity and the Bayesian decoder in the case that the image is not known. Simulations using the motion en-

ergy method and the Bayesian decoder with unknown image reveal that they result in fractional standard

deviations of 10% and 6%, respectively, across the same set of stimulus conditions. Estimation performance is

rather insensitive to the details of the precise receptive field location, correlated activity between cells, and

spike timing. © 2009 Optical Society of America

OCIS codes: 330.4060, 330.4150.

1. INTRODUCTION

The question of how different attributes of a visual stimu-

lus are represented by populations of cells in the retina

has been addressed in a number of recent studies [1–8].

This field has received a major boost with the advent of

methods for obtaining large-scale simultaneous record-

ings from multiple retinal ganglion neurons that almost

completely tile a substantial region of the visual field

[9,10]. The utility of this new method for understanding

the encoding of behaviorally relevant signals was exem-

plified by [4], where the authors examined the question of

how reliably visual motion was encoded in the spiking ac-

tivity of a population of macaque parasol cells. These au-

thors used a simple moving stimulus and attempted to es-

timate the velocity of that stimulus from the resulting

spike train ensemble; this analysis pointed to some im-

portant constraints on the visual system’s ability to de-

code image velocity given noisy spike train responses. We

will explore these issues in more depth in this paper.

In parallel to these advances in retinal recording tech-

nology, significant recent advances have also been made

in our ability to model the statistical properties of popu-

lations of spiking neurons. For example, a statistical

model of a complete population of primate parasol retinal

ganglion cells (RGCs) was recently described [7]. This

model was fit using data acquired by the array recording

techniques mentioned above and includes spike-history

effects and cross-coupling between cells of the same kind

and of different kinds (i.e., ON and OFF cells). The au-

thors demonstrated that the model accurately captures

the stimulus dependence and spatiotemporal correlation

structure of RGC population responses, and allows sev-

eral insights to be made into the retinal neural code. One

such insight concerns the role of correlated activity in pre-

serving sensory information. Using pseudorandom binary

stimuli and Bayesian inference, they reported that stimu-

lus decoding based on the spiking output of the model pre-

served 20% more information when knowledge of the cor-

relation structure was used than when the responses

were considered independently [7].

At the psychophysical level, Bayesian inference has

been established as an effective framework for under-

standing visual perception [11]; some recent notable ap-

plications to understanding visual velocity processing in-

clude [12–17]. In particular, [14] argued that a number of

visual illusions actually arise naturally in a system that

attempts to estimate local image velocity via Bayesian

methods (though see also [18,19]).

Links between retinal coding and psychophysical be-

havior have also been recently examined using Bayesian

methods; [20,21], for example, examine the contribution

of turtle RGC responses to velocity and acceleration en-

coding. This study reported that the instantaneous firing

rates of individual turtle RGCs contain information about

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB25

1084-7529/09/110B25-18/$15.00© 2009 Optical Society of America

Page 2

speed, direction, and acceleration of moving patterns. The

firing-rate-based Bayesian stimulus reconstruction car-

ried out in that study involved a couple of key approxima-

tions. These included the assumptions that RGCs gener-

ate spikes according to Poisson statistics and that they do

so independently of each other. The work of [7] empha-

sizes that these assumptions are unrealistic, but the im-

pact of detailed spike timing and correlation information

on velocity decoding remains uncertain.

The primary goal of this paper is to investigate the fi-

delity with which the velocity of a visual stimulus may be

estimated, given the detailed spiking responses of the pri-

mate RGC population model of [7], using Bayesian decod-

ers, with and without full prior knowledge of the image.

We begin by describing the mathematical construction of

the Bayesian decoders, and then compare these estimates

to those based on a biologically plausible “net motion sig-

nal” derived directly from the spike trains without any

prior image information [4]. We derive a mathematical

connection between these decoders and investigate the

decoders’ performance through a series of simulations.

2. METHODS

A. Model

The generalized linear model (GLM) [22,23] for the spik-

ing responses of the sensory network used in this study

was described in detail in [7]. It consists of an array of ON

and OFF retinal ganglion cells (RGC) with specific base-

line firing rates. Given the spatiotemporal image movie

sequence, the model generates a mean firing rate for each

cell, taking into account the temporal dynamics and the

center-surround spatial stimulus filtering properties of

the cells. Then, incorporating spike history effects and

cross-coupling between cells of the same type and of the

opposite type, it generates spikes for each cell as a sto-

chastic point process.

In response to the visual stimulus I, the ith cell in the

observed population emits a spike train, which we repre-

sent by a response function

ri?t? =?

?

??t − ti,??,

?1?

where each spike is represented by a delta function, and

ti,?is the time of the ?th spike of the ith neuron. We use

the shorthand notation riand r for the response function

of one neuron and the collective spike train responses of

all neurons, respectively. The stimulus I represents the

spatiotemporal luminance profile I?n,t? of a movie as a

function of the pixel position n and time t.

In the GLM framework, the intensity functions (instan-

taneous firing rate) of the responses riare given by

[7,24–26]

?i?t? ? f?bi+ Ji?t? +?

j,?

hij?t − tj,???,

?2?

where f?·? is a positive, strictly increasing rectifying func-

tion. As in [7], we adopt the choice f?·?=exp?·?. The birep-

resents the log of the baseline firing rate of the cell, the

coupling terms hijmodel the within- and between-neuron

spike history effects noted above, and the stimulus input

Ji?t? is obtained from I by linearly filtering the spatiotem-

poral luminance,

Ji?t? =??ki?t − ?,n?I??,n?d2nd?,

?3?

where ki?t,n? is the spatiotemporal receptive field of the

cell i. The parameters for each cell were fit using 7 min of

spiking data recorded during the presentation of a nonre-

peating stimulus, with the baseline log firing rate being a

constant and the various filter parameters being fit using

a basis of raised cosine “bumps” [7]. Given Eq. (2), we can

write down the point process log-likelihood in the stan-

dard way [27]

log p?r?I? ??

i,?

log ??ti,?? −?

i?

0

T

?i?t?dt.

?4?

For movies arising from images rigidly moving with

constant velocity v we have

I?t,n? = x?n − vt?,

?5?

where x?n? is the luminance profile of a fixed image. Sub-

stituting Eq. (5) into Eq. (3) and shifting the integration

variable n by v?, we obtain

Ji?t? =?Ki,v?t;n?x?n?d2n,

?6?

where we defined

Ki,v?t;n? ??ki?t − ?,n + v??d?.

?7?

In the following we replace p?r?I? with its equivalent

p?r?x,v? [since, via Eq. (5), I is given in terms of x and v]

and use the short-hand matrix notation Ji=Ki,v·x for Eq.

(6). An important point is that in the case of a convex and

log-concave GLM nonlinearity, f?·? [conditions that are

true for our choice, f?·?=exp?·?], the GLM log-likelihood,

Eq. (4), is a concave function of x?n?.

B. Decoding

In order to estimate the speed of the moving bar given the

simulated output spike trains r of our RGC population,

we employed three distinct methods. The first method in-

volved a Bayesian decoder with full image information,

the second method utilized a Bayesian decoder with less

than full image information, while the third method in-

volved an “energy-based” algorithm introduced by [4] that

used no explicit prior knowledge of the image. For reasons

that will become clear, these decoders will be hereafter

known as the optimal decoder, the marginal decoder, and

the energy method, respectively. Given a simulated out-

put spike train ensemble, we use each of these methods to

estimate the speed of the stimulus that evoked the en-

semble by maximizing some function across a range of

possible or “putative” speeds.

1. Bayesian Velocity Estimation

To compute the optimal Bayesian velocity decoder we

need to evaluate the posterior probability for the velocity

B26 J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 3

p?v?r? conditional on the observed spike trains r. Given a

prior distribution pv?v?, from Bayes’ rule we obtain

p?v?r? =

p?r?v?pv?v?

?v?p?r?v??pv?v??

.

?8?

If the image x (e.g., a narrow bar with a luminance dis-

tinct from the background) is known to the decoder, then

we can replace p?r?v? with the likelihood function

p?r?x,v?, obtaining

p?v?r,x? =

p?r?x,v?pv?v?

?v?p?r?x,v??pv?v??

.

?9?

p?r?x,v? is provided by the forward model Eq. (4), and

therefore computation of the posterior probability is

straightforward in this case.

Alternatively, if the image is not fully known, we rep-

resent the decoder’s uncertain a priori knowledge regard-

ing x with an image prior distribution px?x?. In this case,

p?r?v? is obtained by marginalization over x:

p?r?v? =?p?r,x?v?dx =?p?r?x,v?px?x?dx.

?10?

Hence, we will refer to p?r?v? as the marginal likelihood.

Given the marginal likelihood, Eq. (8) allows us to calcu-

late Bayesian estimates for general velocity priors. The

prior distribution px?x? which describes the statistics of

the image ensemble, can be chosen to have a naturalistic

correlation structure. In our simulations in Section 3 we

use a Gaussian image ensemble with power spectrum

matched to observations in natural images [28,29].

In general, the calculation of the high-dimensional in-

tegral over x in Eq. (10) is a difficult task. However, when

the integrand p?r,x?v? is sharply peaked around its

maximum [which is the maximum a posteriori (MAP) es-

timate for x—as the integrand is proportional to the pos-

terior image distribution p?x?r,v? by Bayes’ rule] the so-

called “Laplace” approximation (also known as the

“saddle-point” approximation) provides an accurate esti-

mate for this integral [for applications of this approxima-

tion in the Bayesian setting, see e.g., [30]]. The Laplace

approximation in the context of neural decoding is further

discussed in, e.g., [31–35]. We briefly review this approxi-

mation here.

Following [29], we consider Gaussian image priors with

zero mean and covariance Cxchosen to match the power

spectrum of natural images [28]. Let us define the func-

tion

L?x,r,v? ? log px?x? + log p?r?x,v? +

1

2log?2??d?Cx?,

?11?

where d represents the number of pixels in our simulated

image, and rewrite Eq. (10) as

??2??d?Cx??eL?x,r,v?dx.

Using Eq. (4) and px?x?=N?0,Cx?, we obtain the expres-

sion

p?r?v? =

1

?12?

L?x,r,v? = −

1

2xTCx

−??i?t;x,r?dt?,

−1x +?

i??

?

log ?i?ti,?;x,r?

?13?

where ?iare given by Eqs. (2), (6), and (7), and we made

their dependence on x and r manifest. Since both terms in

Eq. (13) are concave (see the closing remarks in Subsec-

tion 2.A), the log-posterior L?x,r,v? is concave in x. To ob-

tain the Laplace approximation, for fixed r, we first find

the value of x that maximizes L (i.e., the image MAP,

xMAP). When the integrand is sharply concentrated

around its maximum, we can Taylor expand L around

xMAPto the first nonvanishing order beyond the zeroth-

order (i.e., its maximum value) and neglect the rest of the

expansion. Since at the maximum the gradient of L and

hence the first-order term vanish, we obtain

L?x,r? ? L?xMAP,r,v? −

1

2?x − xMAP?TH?r,v??x − xMAP?,

?14?

where the negative Hessian matrix

H?r,v? ? ? − ?x?xL?x,r,v??x=xMAP,

is positive semidefinite due to the maximum condition.

Exponentiating this yields the Gaussian approximation

(up to normalization)

?15?

eL?x,r,v?? p?x?r,v? ? N?xMAP?r,v?,Cx?r,v??,

where N??,C? denotes a Gaussian density with mean ?

and covariance C for the integrand of Eq. (12). [An impor-

tant technical point here is that this Gaussian approxi-

mation is partially justified by the fact that the log-

posterior (13) is a concave function of x [24,26,34] and

therefore has a single global optimum, like the Gaussian

(16).] Here, the posterior image covariance Cx?r,v? is

given by the inverse of the negative Hessian matrix

H?r,v?. (Note the dependence on both the observed re-

sponses r and the putative velocity v.) The elementary

Gaussian integration in Eq. (12) then yields

?16?

p?r?v? ?

e−L?xMAP?r,v?,r,v?

??CxH?r,v??

?17?

for the marginal likelihood, or its logarithm

log p?r?v? ? − L?xMAP?r,v?,r,v? −

1

2log?CxH?r,v??.

?18?

The MAP itself is found from the condition ?xL=0, which

in the case of exponential GLM nonlinearity f?·?=exp?·?

yields the equation

xMAP?n;r,v? =?d2n?Cx?n,n???

i?Ki,v?t;n???ri?t?

− ?i?t;xMAP,r??dt.

?19?

Note that this equation is nonlinear due to the appear-

ance of xMAPinside the GLM nonlinearity on the right-

hand side. As mentioned above, the objective function Eq.

(11) is concave and can be efficiently optimized using

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB27

Page 4

gradient-based optimization algorithms, such as the

Newton–Raphson method. In particular, by exploiting the

quasi-locality of the GLM likelihood we can implement

the Newton–Raphson method such that, in cases where

the image x?n? depends only on one component of n, the

MAP can be found in a computational time scaling only

linearly with the spatial size of the image (see Appendix B

for further elaboration on this point). Once xMAPis found,

the Hessian at MAP and Eq. (17) can be calculated easily,

and using Eq. (17), the approximate computation of

p?r?v? is complete.

To recapitulate, in the case of an a priori uncertain im-

age, given the observed spike trains r, we numerically

find xMAP?r,v? for a range of putative velocities v and us-

ing Eq. (17), we compute p?r?v?, from which we may ob-

tain p?v?r? via Eq. (8). We then take the value of velocity

v?that maximizes p?v?r? as the estimate; i.e., we use the

MAP estimate for the velocity.

As discussed in Section 1, our goal here was to critically

examine the role of the detailed spiking structure of the

GLM in constraining our estimates of the velocity. Since

the spiking network model structure enters here only via

the likelihood term p?r?v?, we did not systematically ex-

amine the effect of strong a priori beliefs p?v? on the re-

sulting estimator (as discussed at further length, e.g., in

[14]). Instead we used a simple uniform prior on velocity,

which renders the MAP velocity estimate equivalent to

the maximum (marginal) likelihood estimate, i.e., the

value of v that maximizes p?r?v? given by the approxima-

tion Eq. (17) [or equivalently, its logarithm Eq. (18)].

Similarly, in the case of a priori known image x we chose

the velocity v that maximizes the likelihood p?r?x,v?.

2. Velocity Estimation Using the Energy Method

In order to assess the precision of our Bayesian estimates

of velocity, we compared our estimates to those obtained

using the correlation-based algorithm described in [4].

This algorithm closely resembles the spatiotemporal en-

ergy models for motion processing introduced by [36]. In

order to understand the rationale behind this method, as-

sume, hypothetically, that all the cells have exactly the

same receptive fields up to the positioning of their centers

and that they respond reliably and without noise to the

stimulus. Then the RGCs’ spike trains riin response to

moving images would clearly be identical up to time

translations. In other words, ri?t+ni/v? would be equal for

all i, where niis the center position of the ith cell’s recep-

tive field along the axis of motion, and v is the magnitude

of v. Thus even in the realistic, noisy situation, we expect

the rifor different i to have a large overlap if they are

shifted in time as described, and in principle, we should

be able to recover the true velocity by maximizing a

smoothed version of this overlap. Inspired by this obser-

vation, an energy function is constructed as follows. First,

the spike trains are convolved with a Gaussian filter

w?t??exp?−t2/2?2? (we chose ? to be 10 ms; see below and

[4]). Let us define

r ˜i?t? = w?ri=?

?

e−?t − ti,??2/2?2.

?20?

Then, the “energy” function for the entire population of

cells is determined by the sum of the overlaps of the

shifted and smoothed responses of all cells [37],

i,j?r ˜i?t +

=???

E?v,r? =?

ni

v?r ˜j?t +

v??

nj

v?dt

i

r ˜i?t +

ni

2

dt.

?21?

In order to cancel the effect of spontaneous activity of the

cells, in [4] a “net motion signal” N?v,r? is obtained by

subtracting energy of the left-shifted spike trains from

that of the right-shifted responses:

N?v,r? ? E?v,r? − E?− v,r?.

?22?

Finally, N?v,r? is calculated for v across a range of pu-

tative velocities, and the value that maximizes the net

motion signal is taken as the velocity estimate. Figure 1

illustrates the basic idea of this method for ON cells, al-

though it should be noted that OFF cells are also included

in our analysis.

3. Connection between the Bayesian and Energy-Based

Methods

An interesting connection can be drawn between Baye-

sian velocity decoding and the method of Subsection 2.B.2

based on the energy function Eq. (21). For simplicity,

Raw responses at 14.4°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 7.2°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 14.4°/s, ON cells

Time (sec)

Cell #

01

1

100

Putative speed 28.8°/s, ON cells

Time (sec)

Cell #

01

1

100

(A)

(B)

(C)

(D)

(E)

1 11

2

Fig. 1.

cell layout. Cells in the left most column are numbered 1–10 from

top to bottom, cells in the second column are numbered 11–20

from top to bottom, etc. (B) Raw responses from the ON cells for

a moving bar with speed 14.4°/s. Each tick represents one spike

and each row represents the response of a different cell. (C)–(E)

Same spike trains circularly shifted by an amount equal to the

time required for a stimulus with the indicated putative speed to

move from an arbitrary reference location to the receptive field

center. Responses from OFF cells were also included in this

procedure.

Ensemble motion signals. (A) Moving bar stimulus and

B28J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 5

imagine that spike trains are generated not by the GLM,

but rather by a simpler linear-Gaussian (LG) model. In

this case, it turns out that the marginal likelihood method

is closely related to the energy function method described

above. Specifically, we model the output spike trains as

ri= bi+ Ki,v· x + ?i,

?23?

where the noise term is Gaussian ?i?N?0,??. In the case

that the noise terms for different cells are independent,

we have pLG?r?x,v?=?iN?bi+Ki,v·x,??, though the gener-

alization to correlated outputs is straightforward. We

show in Appendix A that in a certain regime the loga-

rithm of the LG marginal likelihood is given by [see Eqs.

(A7) and (A8) and Eq. (A18)]

i,j?Ri?t +

log pLG?r?v? =

1

2?

ni

v?Rj?t +

nj

v?dt + A?v?,

?24?

where A?v? has no dependence on the observed spike

trains and only a weak dependence on v. We find empiri-

cally that the term A?v? in Eq. (24) grows with velocity,

and therefore its inclusion shifts value of the maximum

likelihood estimate toward higher velocities. Conversely,

its absence in the energy function Eq. (21) causes the en-

ergy method estimate to have a negative bias. See Fig. 5

for an illustration of this effect. The resemblance of the

remaining term to Eq. (21) above is clear. Here, Riare

smoothed versions of the spike trains ri(with the baseline

log firing rate subtracted out) and are given, as in Eq.

(20), by

Ri= wLG? ?ri− bi?,

?25?

where here the optimal smoothing filter wLGis deter-

mined by the receptive fields ki, the prior image correla-

tion statistics, and the velocity [its explicit form is given

in Eq. (A21) in Appendix A], as we discuss in more depth

below.

Thus maximizing the marginal likelihood Eq. (24) is, to

a good approximation, equivalent to maximizing the en-

ergy Eq. (21). The major difference between Eq. (21) and

Eq. (24) is in the filter we apply to the spike trains: r ˜ihas

been replaced by Ri. The key point is that Ridepends on

the stimulus filters ki, the velocity v, and the image prior

in an optimal manner, unlike the smoothing in Eq. (20).

(Note that, while changes in optimal filters at differing

light levels have been discussed in terms of motion esti-

mation in fly vision [38], no account of varying light levels

was taken here.) The dependence of this optimal filter as

a function of v can be explained fairly intuitively, as we

discuss at more length in Appendix A following Eq. (A21).

We find that ?w, the time scale of the smoothing filter wLG,

is dictated by three major time scales, some of which de-

pend on the velocity v: ?k, the width of the time window in

which each RGC integrates its input; lk/v, where lkis the

spatial width of the receptive field; and lcorr/v where lcorr

is the correlation length of natural images. At low veloci-

ties, lk/v and lcorr/v are large, and the smoothing time

scale ?wis also large, since in this case we gain more in-

formation about the underlying firing rates by averaging

over a longer time window. At high velocities, on the other

hand, ?kdominates lk/v and lcorr/v, and ?w??k. This set-

ting of ?wmakes sense because although the image movie

I can vary quite quickly here, the filtered input Ji?t? in-

duces a firing rate correlation time of order ?k, and exam-

ining the responses at a temporal resolution finer than ?k

only decreases the effective signal-to-noise.

Figure 2 illustrates these effects by plotting the opti-

mal smoothing filters wLGfor several different values of

the velocity v. Interestingly, in the high-velocity limit, the

analytically derived optimal temporal filter width ?wis of

the order of 10 ms, which was the value chosen empiri-

cally for the optimal Gaussian filter used in [4]. We re-

computed the optimal empirical filter for our simulated

data here by plotting the standard deviation of the veloc-

ity estimates obtained using the net motion signal [de-

fined in Eq. (22)] against the filter width (Fig. 3). For this

velocity ?28.8°/s? the optimal filter is of the order of

10 ms; thus, we used a filter of width 10 ms when com-

paring the energy method to the Bayesian decoder.

To summarize, maximizing the likelihood marginalized

over the unknown image is very closely related to maxi-

mizing the energy function introduced by [4], if we replace

the GLM with the simpler linear Gaussian model. Since

the actual spike train generation is much better modeled

by the GLM than by the Gaussian model, we expect Baye-

sian velocity estimation (even with uncertain prior knowl-

edge of the image) based on the correct GLM to be more

accurate. This expectation is borne out by our simula-

tions, though it is worth noting that the improvement is

significantly smaller than when the Bayesian decoder has

access to the exact image.

C. Simulations

We simulated the presentation of a bar moving across the

gray background of a CRT monitor refreshing at 120 Hz.

Fig. 2.

locities from 0.2°/s to 28.8°/s (bottom) in exponential steps. The

y axes are scaled in dimensionless units for clarity here. As dis-

cussed in Subsection 2.B.3, there are three time scales that de-

termine the time scale of our filter wLG. At low velocities, shown

in the upper panels, the width of w?t? is determined by the two

scales xk/v and xcorr/v and is thus quite large (since the denomi-

nator v is small). At the higher velocities shown in the lower pan-

els, the optimal filter width is dominated by the time scale of the

receptive field ?k, and is of the order of ?k, which is ?10–20 ms.

For even higher velocities the shape of this filter remains essen-

tially the same as in the bottom panel.

Optimal linear spike train filter wLGfor a range of ve-

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB29

Page 6

The spatial profile of the bar in the direction of motion

was a Gaussian function with a standard deviation (SD)

of 96 ?m. The visual field was represented by a grid of

100?100 pixels covering the receptive fields of two layers

of cells each arranged in a uniform 10?10 grid. One layer

consisted of ON cells, while the other represented OFF

cells. The pixel resolution used was 10 times that used in

[7] resulting in a pixel size of 12 ?m. The bar moved

across the visual field in discrete steps of v pixels/refresh,

although v was not restricted to integer values. On each

trial, the bar traversed the entire visual field once at a

constant velocity. (Therefore, low-velocity trials lasted

longer than high-velocity trials; this will affect some of

our analyses below.) Stimulus dimensions and speeds

were converted to degrees/second using the approxima-

tion 200 ?m/° [39] with a pixel size of 12?12 ?m. This

meant that, with a refresh rate of 120 Hz, a speed of

1 pixel/refresh corresponded to a speed of 7.2°/s.

Then, to investigate the fidelity with which speed was

encoded by our model, we ran simulations using a variety

of stimulus parameter settings. Specifically, we conducted

100 trials at each of 48 stimulus conditions. These 48 con-

ditions were made up of eight speeds (10.8, 14.4, 21.6,

28.8, 36.0, 43.2, 50.4 and 57.6°/s) by six luminance levels

(0, 0.125, 0.25, 0.75, 0.875 and 1 on a grayscale level

where 0 is black and 1 is white, and the background level

was set at 0.5). We also refer to the six different lumi-

nance levels in terms of the contrast of the bar with re-

spect to the background. More precisely we define the con-

trast as

?Ibar−Ibackground?/Ibackground,

Ibackgrounddenote the bar and the background luminance,

respectively. These six luminance levels thus become con-

trast levels −1, −0.75, −0.5, 0.5, 0.75, and 1.

For each of these trials, we obtained a set of spike

trains r. From these spike trains, it was possible to esti-

mate the speed of the stimulus used as being one of a

number of putative speeds. The putative speeds tested in

our simulations ranged from 7.2 to 108°/s in steps of

0.36°/s. Thus, we could compare speed estimates across

stimulus conditions by examining the SD of estimates

across the 100 trials performed for each condition. As in

[4], we focused on the fractional SD (SD of velocity esti-

where

Ibar

and

mate divided by the true stimulus speed) of estimates to

assess the fidelity of retinal speed signals, as any system-

atic bias in speed estimate can in principle be compen-

sated for by downstream processing. However, we will

also present the dependence of the estimate bias on

stimulus condition. As will be seen, the fractional bias

and the fractional SD are roughly of the same order and

thus both contribute to the total root-mean-square frac-

tional error of the velocity estimate. The latter is given by

the square root of the sum of the squared fractional bias

and squared fractional SD. It should be noted that other

contrast levels between −0.5 and 0.5 were also tested but

are not presented, as for some combinations of decoder

and speed, the velocity estimation performance at these

low contrasts was not significantly above chance.

As outlined above, we used three different decoding

methods to estimate the stimulus velocity from the simu-

lated spike train ensembles; we compared Bayesian veloc-

ity decoding with (optimal decoder) and without (mar-

ginal decoder) complete prior information about the

image with velocity estimation using the energy method.

In particular, in Subsection 3.A.4 we discuss the effect of

prior image uncertainty on the performance of the Baye-

sian decoder in more detail. In order to parametrically

vary the prior information available to the decoder, in the

simulations used in that section, the image was flashed a

number of times to the cells while it was held fixed, and

the image prior p?I? was updated according to the ob-

served spike train data elicited by the flashes (no preview

flashes were used in the simulations discussed in other

sections). See Fig. 6(b) below for an illustration of this

procedure. Short flashes were used instead of a continu-

ous uninterrupted presentation, because in the latter

case, the cells rapidly filter out the fixed image contrast,

and thus after a brief interval ??20–30 ms?, the spike

trains cease to carry extra information about the image.

The more times the image is flashed, the smaller the de-

coder’s uncertainty Cxwhen the image starts moving.

This allows the decoder to better estimate the velocity

when it finally sees the same image in motion.

3. RESULTS

A. Comparison of the Different Velocity Decoders

In this section we compare the performance of the energy

model with the optimal and marginal decoders, as de-

scribed in Section 2. Figure 4(a) plots the velocity poste-

rior p?v?r,x? for the case of an a priori known image (the

moving bar described above) given a specific observed

population spike train r in response to the moving bar

stimulus as a function of putative stimulus speed v. Here,

the true stimulus speed was 36.0°/s. Figure 4(b) shows

the log of the marginal likelihood in the case where the

image is not completely known, and Fig. 4(c) shows the

value of the net motion signal N again as a function of pu-

tative speed and for the same stimulus. All three decoders

successfully estimated the speed in the trial shown; how-

ever, it is clear from the figure that the net motion signal

is much less sharply peaked around the stimulus speed

than for the Bayesian decoders. The consequences of

these findings are reflected in the lower panels of Fig. 4,

which show that the distribution of speed estimates

0.0010.01 0.1

0.1

0.2

Filter width (s)

Standard deviation

Fig. 3.

locity estimates [obtained using the signal defined in Eq. (22)]

across 100 presentations of a black bar moving at a speed of

28.8°/s across a gray background. Note that a filter width of ?w

?10 ms is optimal, in agreement with the findings of [4] and

with the width of the optimal filter shown in Fig. 2.

Effect of filter width ?won the standard deviation of ve-

B30 J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 7

across 100 presentations of a bar of contrast −0.5 moving

at a speed of 36.0°/s is most precise using the optimal de-

coder [Fig. 4(d)], rather than either the marginal decoder

(E) or the energy method (F). Also plotted are Gaussian

fitstothe distributions

35.76±0.81°/s for the optimal decoder, 37.1±3.05°/s for

the marginal decoder and 36±6.09°/s for the net motion

signal. The fractional SD averaged across all conditions

simulated in this study was 1.6% of the stimulus speed

for the optimal decoder, 6.4% for the marginal decoder,

and 10% of the stimulus speed for the energy method.

Since the estimators are not unbiased, their root-mean-

square error is larger than their SD, as the error receives

a contribution from the bias as well. The root-mean-

square fractional errors averaged across all stimulus con-

ditions were 2%, 6.9%, and 11%, for the optimal decoder,

marginal decoder and the energy method, respectively.

Because the velocity estimation based on the energy

method does not make use of the image profile at any

stage, these results are as expected with the performance

of the marginal decoder being intermediate between that

of the optimal decoder and the energy method.

witha mean ±SDof

1. Accuracy as a Function of Stimulus Speed

Because in our simulations the moving bar stimulus

makes only one pass over the visual field, more time is

spent traversing the field and more spike train informa-

tion is obtained for slower moving stimuli. Figure 5(a) il-

lustrates the fractional SD of 100 speed estimates for both

of the Bayesian methods and the energy method, at each

of the eight stimulus speeds, averaged across the six con-

trast levels. As expected, performance declines with in-

creasing speed for all three methods. The Bayesian decod-

ers provide more precise estimates than the energy

method at all speeds. Again, the advantage of the Baye-

sian decoder over the energy method is partly lost when

its prior information about the image is uncertain.

2. Accuracy as a Function of Stimulus Contrast

Lowering the contrast of the moving bar causes a reduc-

tion in the number of stimulus-related spikes generated

by the GLM model, according to Eqs. (2) and (3). As with

increasing stimulus speed, this obviously results in a re-

duction in stimulus-related information with which to es-

timate the stimulus speed. (Note that the model of [7]

lacks explicit luminance-or contrast-gain control effects;

thus, these results should be interpreted in terms of local

modifications around a fixed luminance pedestal that are

sufficiently small to avoid engaging classical luminance

gain-control mechanisms.) To examine this relationship,

we averaged the SD of the 100 speed estimates at each of

the six contrast levels across the eight stimulus speeds.

The results are shown in Fig. 5(b) and illustrate the ex-

pected increase in performance with increasing stimulus

contrast. Again, the Bayesian decoders clearly outperform

the energy method at all levels.

3. Effect of Contrast and Speed on Mean Speed Estimate

While we were primarily concerned with the precision of

speed estimates in the current study, a number of well-

28.8 32.4

Speed (°/s)

3639.6 43.2 46.8

Marginal Likelihood

28.832.4

Speed (°/s)

36 39.643.2 46.8

Marginal Likelihood

28.832.4

Speed (°/s)

36 39.6 43.246.8

Net motion signal

29 32 35 38 41 44

Speed estimate (°/s)

0

5

10

15

20

25

Trials

29 32 35 38 41 44

Speed estimate (°/s)

0

5

10

15

20

25

Trials

29 32 35 38 41 44

Speed estimate (°/s)

0

5

10

15

20

25

Trials

(A)

(B) (C)

(D)

(E) (F)

Fig. 4.

motion signal” method, in terms of precision. (A) Posterior for the optimal decoder. (B) Log marginal likelihood for the marginal decoder.

(C) Net motion signal N as a function of putative stimulus speed v for spike trains generated using a stimulus with speed 36.0°/s and

contrast −0.5 for a trial where all methods successfully estimate the stimulus speed. It can be seen that the nonmarginal likelihood (A)

is more sharply peaked around the stimulus speed than the marginal likelihood (B) and the net motion signal (C). Distribution of speed

estimates across 100 presentations of a bar moving at a speed of 36.0°/s using (D) the optimal posterior probability, (E) the marginal

likelihood, and (F) the net motion signal. Also plotted are Gaussian fits to the distributions with mean±SD of 35.76±0.81 for the optimal

decoder, 37.1±3.05 for the marginal decoder, and 36±6.09 for the net motion signal.

Optimal decoder leads to the most precise velocity estimates, while the marginal decoder outperforms the energy-based “net

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB31

Page 8

researched visual phenomena concerning the relationship

between the mean visual speed perceived, i.e., the bias,

and the properties of the visual stimulus prompted us to

investigate this in our simulations. The first phenomenon

of interest was that in which humans tend to choose the

slowest motion that explains the incoming information

[40], i.e., we have a bias toward slower speeds. As can be

seen in Fig. 5(c), the energy method is biased toward

lower velocity estimates at higher stimulus speeds. The

optimal decoder shows a very slight tendency in this di-

rection also. On the other hand, the marginal decoder has

a positive bias toward higher velocities. The second phe-

nomenon of interest was that in which stimuli with low

contrast are typically perceived as moving slower than

those with high contrast [41,42]. Figure 5(d) plots the

fractional bias of the speed estimate, i.e., the difference

between the true stimulus speed v and the mean esti-

mated speed ?v?? normalized by v, versus the stimulus

contrast for both the Bayesian decoder and the energy

method against the stimulus contrast, averaged across all

speeds tested in our simulations. There appears to be a

slight trend toward greater bias at low contrast, although

it should be noted that this is due to a strong bias at low

negative contrast, while at low positive contrast, the bias

is close to zero. The fact that the fractional SD of the

speed estimate at this low negative contrast value is so

large makes it difficult to say anything definitive about a

relationship between stimulus contrast and speed esti-

mate bias. It is important to remember that a uniform

prior on velocity was used in this study when considering

the lack of any clear effect.

4. Effect of Prior Image Information

Here we discuss the effect of preview flashes of the fixed

image on the velocity decoding performance of the mar-

ginal decoder. As mentioned above, the more times the

image is flashed or “shown” to the cells, the less will be

the decoder’s uncertainty about it and the better the ve-

locity estimate made by the decoder when it finally sees

the same image in motion. This effect is shown in Fig. 6

where panel (A) shows the decrease in the relative error

of the velocity estimate as the number of flashes in-

creases. For a large number of flashes the error asymp-

totically reaches the level for the fully known image

14.4 28.843.2 57.6

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Stimulus Speed (°/s)

Fractional SD of speed estimate

Bayesian: Known Image

Bayesian: Uncertain Image

Energy Method

−1−0.6

Stimulus Contrast

0.61

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Fractional SD of speed estimate

Bayesian: Known Image

Bayesian: Uncertain Image

Energy Method

(A)(B)

14.4 28.843.2 57.6

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Stimulus Speed (°/s)

(〈v*〉−v)/v

Bayesian: Known Image

Bayesian: Uncertain Image

Energy Method

−1 −0.6

Stimulus Contrast

0.61

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

(〈v*〉−v)/v

Bayesian: Known Image

Bayesian: Uncertain Image

Energy Method

(C)(D)

Fig. 5.

with full image information (optimal decoder), the Bayesian decoder with incomplete image information (marginal decoder), and the

energy method. (C), (D) plot the difference between the mean estimated speed ?v?? and the true stimulus speed v normalized by v against

the true stimulus speed and stimulus contrast, respectively. Note that the Bayesian decoder provides more precise estimates than the

energy method at all levels, with performance improving with prior image information. Furthermore, it should be noted that for con-

trasts lower than ±0.5, particularly at high speeds, the dearth of information in the spike train ensemble resulted in the estimate from

an inordinate number of trials being either the highest or lowest input putative speed. Accordingly, performance for s in this range were

not calculated and not plotted.

Fractional standard deviation of speed estimates versus (A) stimulus speed and (B) stimulus contrast for the Bayesian decoder

B32J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009 Lalor et al.

Page 9

(shown by dashed curves). Panel (B) shows the conver-

gence of the estimated luminance profile xMAPto that of

the actual bar image as the number of preview flashes in-

creases. Figure 6 shows this effect for a particular stimu-

lus velocity and contrast, but we note that the effect is

qualitatively the same for all other values of these stimu-

lus parameters; as the number of flashes increases be-

yond a few, the accuracy of the marginal decoder’s esti-

mate approaches that of the optimal decoder’s.

As seen here and above, the efficiency of the GLM-

based Bayesian decoder can be significantly deteriorated

when the prior information about the image is too incom-

plete. As we showed in Subsection 2.B.3, Bayesian decod-

ing with uncertain prior image information is, except for

the replacement of the GLM with the LG model, closely

related to the energy model. Indeed, in our simulations,

the disparity between the performances of the energy

model and those of the GLM-based Bayesian decoder was

largely lost when the latter decoder’s prior knowledge of

the image became too uncertain.

B. Effects of Manipulating Model Parameters

1. Importance of Correlation Between Cells

In order to investigate the importance of correlated activ-

ity between cells, we wished to remove the interaction be-

tween neighboring spike trains without reducing the

overall spiking rate. We used a straightforward trial-

shuffling approach: we generated 200 individual spike

trains, one for each cell, using 200 distinct presentations

of the stimulus to the full model. We then constructed a

single trial surrogate population spike train by serially

assigning each independent spike train recorded on simu-

lated trial i as the observed spike train in cell i. We re-

peated this 100 times to obtain spike ensembles repre-

senting 100 trials for each of the 48 conditions mentioned

above (i.e., eight different speeds and six different con-

trast levels). This allowed us to determine the fractional

standard deviation of the speed estimate for each of the

48 different stimulus conditions. It should be noted that

this (somewhat involved) procedure was carried out in

preference to simply removing the coupling between cells,

as that would have resulted in a different average num-

ber of population wide spikes compared with the output

from the full model, which would have had a confounding

effect on the results.

The results are shown in Figs. 7(a)–7(c) for the optimal

decoder, marginal decoder and the energy method, respec-

tively, and are plotted versus the fractional standard de-

viation of the speed estimate for the same 48 conditions

using the spike train ensembles obtained directly from

the model. The diagonal lines indicate equality between

the fractional SD of the speed estimates obtained using

the shuffled responses and that obtained directly from the

model. Somewhat surprisingly, given the significant cor-

relations in this data (cf. Fig. 2 in [7]), this trial-shuffling

procedure did not significantly hurt the performance of

any of the three velocity estimators. For the marginal de-

coder, there is a noticeable reduction in performance for

those stimulus conditions with more precise speed esti-

mates, while for conditions with higher fractional SD,

most points lie just below the line. However, for the other

two decoders, if anything, there is a slight bias in Fig. 7(a)

and 7(c), with data points tending to lie a bit below the

identity line in both plots, indicating that the shuffling

procedure happened to lead to velocity estimates with

slightly reduced variability. These results are consistent

with the conclusions of [2], that treating retinal ganglion

cells as independent encoders leads only to a minor loss of

information.

Fig. 6.

of this simulation. (A) The solid curve with error bars shows the drop in the fractional rms error of the velocity estimate for an a priori

unknown image as the number of preview flashes increases. The dashed curve is the fractional error for the case of an a priori known

image. The true velocity was 28.8°/sec and the bar contrast 0.6. (B) The plots show the maximum a posteriori estimate of the image

luminance profile (solid curve) in four trials with different numbers of preview flashes (indicated below each plot). The gray areas indi-

cate the marginal uncertainty of the estimated luminance, and the dashed curve shows the actual image profile.

Effect of decreasing image uncertainty on accuracy of Bayesian velocity estimation. See Subsection 2.C for a detailed description

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB33

Page 10

2. Timing Structure of Spike Trains

The question of whether cell spiking activity can be accu-

rately modeled as a simple Poisson process with a time-

varying rate or whether the intrinsic temporal structure

of retinal spike trains plays an important role in commu-

nication has a long history in systems neuroscience.

Simulations with the retinal ganglion cell model used in

this study have demonstrated that preserving the spike

history and cross-coupling effects can increase stimulus

decoding performance by up to 20% [7]. We wished to ex-

amine the effect of removing the specific timing informa-

tion of the individual spike trains. This was carried out

using the method of [4]. Specifically, we generated a spike

train for each cell for 100 trials of the moving bar stimu-

lus. We then randomly selected spike times for each cell,

with replacement, from that cell’s spike distribution [its

peri-stimulus time histogram (PSTH)], such that the

number of spikes in each resampled spike train was equal

to the average number of spikes in the corresponding

original spike trains. This results in a spike train for each

cell where spikes occur according to the marginal mean

firing rate only, with no consideration given to spike his-

tory effects such as action potential refractoriness. Note

that this process is even more disruptive of spike timing

information than the shuffling procedure described in the

previous subsection, since now we are destroying spike

train structure both between and within cells. Again, this

convoluted process was carried out in preference to sim-

ply removing the spike history filters hijfrom the model

before generating the spike trains, as removal of those fil-

ters would have resulted in a greater number of total

spikes and would thus have resulted in a misleadingly

good speed estimation performance. This process of gen-

erating a spike train ensemble through resampling was

carried out for each of the 48 stimulus conditions men-

tioned above.

The results are shown in Figs. 7(d)–7(f) for the optimal

decoder, marginal decoder and the energy method, respec-

tively, and are plotted versus the fractional standard de-

viation of the speed estimate for the same 48 conditions

using spike train ensembles obtained directly from the

model. Once again, the effects of this spike timing disrup-

tion on the performance of the velocity estimators was

fairly minimal, with the resampled spike trains appear-

ing to give a marginally worse performance as indicated

by the preponderance of data points slightly above the

identity line. Again for the marginal decoder the de-

creased performace is more pronounced for stimulus con-

ditions with better performance.

3. Parameters of Cell Population

In the simulations above, two simple assumptions were

made about the parameters of the cell population. First,

the cells were arranged in an oversimplistic grid as in Fig.

8(a). And second, all ON cells were given a baseline log

firing rate [biin Eq. (2)] of 2 spikes/s and all OFF cells a

baseline log firing rate of 3 spikes/s, corresponding to the

mean values obtained when fitting the model [7]. In order

to examine a somewhat more biologically realistic case we

jittered the center location of the cells as in Fig. 8(b) and

0.01

Regular

0.1

0.01

0.1

Shuffled

Fractional SD of speed estimate

0.01

Regular

0.1

0.01

0.1

Resampled

Fractional SD of speed estimate

0.01

Regular

0.1

0.01

0.1

Shuffled

Fractional SD of speed estimate

0.01

Regular

0.1

0.01

0.1

Resampled

Fractional SD of speed estimate

0.010.1

0.01

0.1

Regular

Shuffled

Fractional SD of speed estimate

0.010.1

0.01

0.1

Regular

Resampled

Fractional SD of speed estimate

(A)(B) (C)

(D)(E) (F)

Fig. 7.

responses plotted as a function of that obtained using regular simulated data for (A) the Bayesian decoder with full image information,

(B) the Bayesian decoder with incomplete image information, and (C) the energy method. Fractional SD of speed estimate using resa-

mpled spike trains plotted as a function of that obtained using regular simulated data for (D) the Bayesian decoder with full image

information, (E) the Bayesian decoder with incomplete image information, and (F) the energy method. Diagonal lines indicate equality.

Each circle represents a different one of the 48 speed-by-contrast stimulus conditions. Note that the performance of the decoders is rela-

tively unaffected by these rather drastic manipulations of spike timing.

Effect of correlated activity and spike timing structure on speed estimates. Fractional SD of speed estimates using shuffled

B34 J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 11

randomly selected the baseline log firing rates of the ON

and OFF cells from uniform distributions on intervals

1 to 3 spikes/s and 1.5 to 4.5 spikes/s, respectively.

Figure 9 illustrates the speed estimates over 100 trials

for a stimulus with speed of 28.8°/s and contrast of −1 us-

ing the regular cell arrangement and uniform baseline log

firing rates (left column) versus the jittered cell arrange-

ment and random baseline log firing rates (middle col-

umn) for all three methods. The performances of the op-

timal decoder, marginal decoder and energy method are

shown in the top, middle, and bottom row, respectively.

No significant difference in performance between the

regular and jittered arrangements is apparent for any

method.

While randomly jittering the baseline log firing rates

around the mean caused no obvious change in estimation

accuracy, this does not allow us to comment on the pos-

0 50

x

100

0

50

100

y

0 50

x

100

0

50

100

y

(A) (B)

Fig. 8.

tered cell arrangement.

(A) Simple rectangular grid cell arrangement. (B) Jit-

26

Putative stimulus speed (°/s)

2729 3032

0

20

40

60

80

100

Trials

Regular

26

Putative stimulus speed (°/s)

2729 3032

0

20

40

60

80

100

Trials

Jittered

26

Putative stimulus speed (°/s)

27 2930 32

0

20

40

60

80

100

Trials

Higher baseline

26

Putative stimulus speed (°/s)

27 2930 32

0

20

40

60

80

100

Trials

26

Putative stimulus speed (°/s)

2729 3032

0

20

40

60

80

100

Trials

26

Putative stimulus speed (°/s)

2729 3032

0

20

40

60

80

100

Trials

26

Putative stimulus speed (°/s)

Histograms illustrating the velocity estimates over 100 trials for a stimulus with velocity 28.8°/s and contrast of −1 using the

regular cell arrangement and uniform baseline log firing rates (left column) and the jittered cell arrangement and random baseline log

firing rates (middle column). The top row represents the performance of the optimal Bayesian decoder, the middle row represents that of

the marginal decoder, and the bottom row presents that of the energy method. Similar performance was obtained with both the

rectangular-grid and randomized spatial layouts for all three methods. The right column illustrates the improved estimation perfor-

mance obtained for all three methods by doubling the baseline log firing rates from 2 and 3 spikes/s to 4 and 6 spikes/s for the ON and

OFF cells respectively.

27 2930 32

0

20

40

60

80

100

Trials

26

Putative stimulus speed (°/s)

272930 32

0

20

40

60

80

100

Trials

26

Putative stimulus speed (°/s)

272930 32

0

20

40

60

80

100

Trials

Fig. 9.

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB35

Page 12

sible effects of changes in the mean baseline log firing

rate. To assess this, we also carried out 100 simulations

using a stimulus with speed of 28.8°/s and a contrast of

−1, where the cells were arranged in the original simple

grid and the ON and OFF cells were given baseline log fir-

ing rates of 4 and 6 spikes/s, respectively. The right col-

umn of Fig. 9 illustrates the significantly improved esti-

mation performance obtained by inflating the baseline log

firing rates compared to the fitted values used throughout

the rest of this study for all three methods.

4. DISCUSSION

The model of [7] employed stochastic checkerboard

stimuli in order to accurately capture both the stimulus

dependence and detailed spatio temporal correlation

structure of responses from a population of retinal gan-

glion cells. In this study, we have examined responses

from this model to a more behaviorally relevant coherent

velocity stimulus. Specifically, we have used these re-

sponses to assess how faithfully speed is encoded in a

population of neurons using an optimal Bayesian decoder,

with complete knowledge of the stimulus image. We have

also shown how to compute the Bayesian velocity esti-

mate in the case where we have only a limited amount of

information about the stimulus image, and how the Baye-

sian estimate in this case is closely related to a biologi-

cally plausible motion-energy-based method [36,43].

A connection between Bayesian velocity estimation and

the energy method of [36] has been noted before [12,15].

In that work, a Bayesian model of local motion informa-

tion was described. It was shown that this model could be

represented using a number of mathematical “building

blocks” that qualitatively resembled direction-selective

complex cells. Given that models of those cells have been

based on the energy method of [36], a link was drawn be-

tween the two methods. Furthermore, previous work has

sought to optimally estimate instantaneous motion from

spike train ensembles in the fly [38,44]. However, to the

best of our knowledge, in the case of non local estimation

of rigid motion, the mathematical connection revealed

here between the energy method and the Bayesian

method based on the marginal image likelihood in the LG

case has not been previously described.

In terms of biological plausibility, it is unlikely that the

brain performs optimal Bayesian inference with full

knowledge of the image in order to estimate velocity. This

is supported by a recent study, which [8] employed the en-

ergy method (Subsection 2.B.2) to examine the efficiency

of the code from a population of primate RGCs. They did

this by comparing the estimate of the velocity of a stimu-

lus using the spiking activity in the cell population with

psychophysical estimates made by human observers.

While the energy model consistently outperformed the

human observers, it was shown that at very brief presen-

tation times, i.e., ?100 ms the difference in estimation

performance between the energy method and the human

behavior was much smaller than at longer presentation

times. This suggests that readout of the retinal popula-

tion code can be extremely efficient when exposure to the

moving stimulus is very brief, but less efficient over long

trials when storing information over a long time is re-

quired by the optimal Bayesian decoder. In this study,

having used longer presentation times ?125–675 ms?, and

given that the optimal Bayesian decoder significantly out-

performs even the energy method, it seems clear that hu-

man observers do not decode using a known image in this

task. Instead, given the relationship presented here be-

tween the marginal decoder and the energy method, it ap-

pears that a strategy equivalent to marginalization over

the uncertain image seems to be more consistent with the

available data.

A couple of factors in the relationship between the mar-

ginal decoder and the energy method are worthy of fur-

ther discussion. First, the optimal filter width for velocity

estimation from cell population responses when using the

energy method on real data was reported to be of the or-

der of 10 ms [4]. This implies that the elementary motion

signal was conveyed with a time resolution comparable to

the interspike interval of RGCs. A similar filter width was

empirically shown using our simulated data (Fig. 3),

which was not unexpected given the large amount of vari-

ance captured by the model in peristimulus time histo-

grams in response to novel stimuli [7]. Of more interest,

however, the optimal filter derived analytically for our

(LG) marginal decoder is also shown to be of similar

width, at least in the case where stimulus velocities are

above about 5°/s (Fig. 2). This lends further weight to the

biological plausibility of the marginal decoder. The notion

that optimal filters based on stimulus filters, natural im-

age prior, and velocity could have a biological instantia-

tion seems reasonable. Second, in this study, we have as-

sessed the performance of the energy method relative to

the marginal decoder where the spike trains were gener-

ated not by a simple LG model, but by a GLM model. Be-

cause of the resulting improvement in spike train model-

ing we saw a significant improvement in velocity

estimation for the marginal decoder relative to the energy

method. Obviously the optimal decoder outperformed

both other methods given the extra image information

with which it was furnished.

In terms of performance specifics, the optimal Bayesian

decoder achieved an average relative precision of 2%

across all 48 stimulus conditions, with the marginal de-

coder achieving 6.4% and the energy method realizing

only 10% relative precision. It is interesting to compare

the estimation performance using our model to that ob-

tained using similar stimuli with real cells in [4]. The au-

thors of that study reported that the ensemble activity of

around 100 RGCs signaled speed with a precision of the

order of 1%. The precision of 10% obtained using the same

decoder on our model output spike trains is higher than

that result. One likely reason for this is that our stimulus

range included much lower contrast stimuli. If we restrict

our precision estimate to those conditions that most

closely resemble those used by [4], i.e., speeds of 10.8,

14.4, 28.8, and 57.6°/s and contrast levels of −1 and 1, we

obtain a value of 2.8% using the energy method which is

of the same order as their result.

We examined the precision of our speed estimates as a

function of both stimulus speed and stimulus contrast. As

expected, decoding performance improves with increasing

contrast and with decreasing speed (Fig. 5). Figure 5(a) il-

lustrates that our model approximately followed a

B36J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009 Lalor et al.

Page 13

Weber–Fechner law with visual speed discrimination be-

ing roughly proportional to speed [45]. As discussed in

Subsection 3.A.1, the faster the moving bar traverses the

retina, the less time spent stimulating the cells, and the

smaller the total number of spikes we have with which to

decode the stimulus speed. Similarly, the precision of the

speed estimate improves with increasing absolute con-

trast, which increases the effective signal-to-noise of the

retinal output [see Fig. 5(b)]. The nonlinear function f?·?

used in Eq. (2) for this study was chosen to be exp?·?.

Given that in determining the firing intensities ?i?t?, this

function operates on the stimulus input (as well as the

baseline firing rates and spike history and cross-coupling

effects), any increase in stimulus contrast would be ex-

pected to have a strong impact on the stimulus-related fir-

ing rates; similar conclusions may be drawn from an

analysis of the Fisher information in this model [26].

As mentioned earlier, Bayesian modeling has been em-

ployed in a number of studies investigating how visual

speed perception is affected by properties of the visual

stimulus. In [16] an optimal Bayesian observer model was

used to examine human psychophysical data in terms of

stimulus noise characteristics and prior expectations.

They reported that the perception that low-contrast

stimuli move more slowly than high-contrast stimuli was

well modeled by an ideal Bayesian observer. This was be-

cause the broader likelihood (based on psychophysical

measurements), when multiplied by a prior favoring low

speeds [46], resulted in a larger shift toward zero than

multiplication by a narrower likelihood. In the present

study, a uniform prior was used for the speed of the mov-

ing bar. Thus, we would not expect a widening of the like-

lihood distribution by lowering the stimulus contrast to

shift the location of the posterior probability distribution.

As such, we would not expect any relationship between

stimulus contrast and the mean (or median) of the speed

estimate distribution. This appeared to be the case, with

no straightforward relationship seen to exist between

speed estimate bias and contrast [Fig. 5(d)]. There did ap-

pear to be a very slight trend toward greater bias to low

speeds at low contrasts for the energy method, but given

the much higher variance in the speed estimate at this

contrast [Fig. 5(b)], we are disinclined to draw any deeper

conclusions from these results.

In terms of a relationship between speed estimate bias

and stimulus speed, however, our results indicate a clear

trend. Specifically, there appears to be a systematic bias

in speed estimation tending to underestimate speed at

high stimulus velocities for both the energy method and

the Bayesian decoder with known image, while tending to

overestimate speed at the same high stimulus velocities

for the Bayesian decoder with uncertain image [Fig. 5(c)].

This can be explained by the well-known fact that

likelihood-based estimators can display bias in low-

information settings (as the high-speed setting is here,

since effectively less time is available to observe spiking

data during the stimulus presentation). In the low-speed,

high-information setting, the bias of the likelihood-based

estimator is negligible, as expected. The discrepancy be-

tween the biases of the marginal decoder and the energy-

based estimate is clarified by the connection between

these two methods as described in Subsection 2.B.3 and

Appendix A. Specifically, see the discussion after Eqs. (24)

and (25) of Subsection 2.B.3 and Eqs. (A7) and (A8) of Ap-

pendix A.

In [7] it was found that, when comparing the full RGC

model with an uncoupled version (retaining spike history

effects), Bayesian stimulus decoding recovered 20% more

information (about the spatiotemporal light intensity pro-

file) using pseudorandom stimuli. The authors also noted

that additionally ignoring spike history effects further re-

duced the recovered information by 6%. Thus, we wished

to examine the importance of correlations between cells

and of the intrinsic timing structure of the spike trains for

speed estimation precision. We followed the procedure

employed in [4] and, as in that study, it appeared that, for

most stimulus conditions, the shuffled, uncorrelated spike

trains surprisingly resulted in a weak improvement in es-

timation precision. The one exception to this was in the

case of the marginal decoder, where for high-information

stimulus conditions (i.e., low speeds and high contrast),

the shuffled spike trains resulted in a significant reduc-

tion in performance. We also replicated their test of how

precise spike timing might affect speed estimation preci-

sion [4]. Again, as in their study, we found only a very

slight reduction in performance for the optimal Bayesian

decoder and the energy method. Similar to the reshuffled

case, the deterioration in the performance of the marginal

decoder was more pronounced. Given that we have com-

pletely abolished the intraneuronal and interneuronal

nonstimulus-driven correlation structure here, these

small decreases in performance indicate that velocity de-

coding does not depend strongly on the fine spike train

structure—at least in the very simple case of a moving

bar. It should be noted that for the results plotted in Fig.

7, all spike train ensembles were decoded using the full

model. That is, coupling filters and spike history effects

were assumed and accounted for when calculating ?iin

the decoding step. Given that coupling effects were re-

moved by our shuffling procedure and that both coupling

effects and spike history effects were removed by our re-

sampling procedure, it is possible that decoding the spike

trains with an appropriately reduced model might provide

more accurate speed estimation for these manipulated

spike train ensembles. To that end, we used a model with-

out coupling filters to decode the speed of the shuffled

spike train ensembles and a model with all hijset to zero

to decode the speed of the resampled spike train en-

sembles. It is interesting to note that incorporating this

knowledge about the presence or absence of cell coupling

and spike history effects into the decoding made no quali-

tative difference to the accuracy of the estimated velocity

(not shown).

5. CONCLUSION

Optimal Bayesian decoding with full image information

has been shown to outperform a “motion energy” method

that uses no prior image information. A method for per-

forming Bayesian decoding without full image informa-

tion has been described and has demonstrated perfor-

mance intermediate between that of the optimal decoder

and the energy method. All of these methods appear to

outperform human psychophysical performance [8], par-

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB37

Page 14

ticularly in experiments in which the motion stimulus

was visible for an extended period of time. A mathemati-

cal description of the connection between the Bayesian

decoder with less than full image information and the en-

ergy method indicates that, in addition to the extra infor-

mation about the image used by the Bayesian estimator,

information about the network’s spatiotemporal stimulus

filtering properties also plays an important role in opti-

mal velocity estimation. The results of a number of simu-

lations indicate a good correspondence between the speed

encoding performance of the model and that of a popula-

tion of real RGCs. This work thus provides a rigorous

framework with which to explore the factors limiting the

estimation of velocity in vision. Future work will seek to

utilize these methods to investigate motion decoding us-

ing more complex stimuli moving in nontranslational

ways, perhaps incorporating real-world issues such as oc-

clusions and accelerations. Also, we aim to employ the

methods to investigate real data.

APPENDIX A: MARGINAL LIKELIHOOD IN

THE LINEAR GAUSSIAN MODEL

In this appendix we show that the logarithm of the mar-

ginal likelihood p?r?v? for a simple LG model of the RGCs

is closely related to the energy function of [4], and thus for

this model the Bayesian velocity decoding is nearly

equivalent to the energy method. In the linear Gaussian

model, the response of cell i, ri, is given linearly in terms

of the image intensity profile x up to additive Gaussian

noise with covariance ?, as in Eq. (23). Thus we have

pLG?r?x,v? =?

i

N?bi+ Ki,v· x,??.

Using this and px?x?=N?0,Cx? as the Gaussian image

prior, we repeat the steps in Eqs. (11)–(19) of Subsection

2.B.1. For the LG model, the log-posterior function is

given by

LLG?x,r,v? ? log?px?x?? + log?pLG?r?x,v?? = −

−

i

1

2xTCx

−1x

1

2?

?ri− bi− Ki,v· x?T?−1?ri− bi− Ki,v· x?

+ const.

?A1?

instead of Eq. (11), and the marginal distribution pLG?r?v?

by

pLG?r?v? =?eLLG?x,r,v?dx,

?A2?

similar to Eq. (10). As before, setting ?xLLG=0 yields the

equation for xMAP, which unlike Eq. (19) is linear, and can

be easily solved to yield

xMAP?r,v? = H?v?−1?

i

Ki,v

T· ?−1· ?ri− bi?.

?A3?

Here, the negative Hessian is given by

H?v? = − ?x?xLLG= Cx

−1+?

i

Ki,v

T· ?−1· Ki,v,

?A4?

which is now independent of the observed spike trains r.

Using Eqs. (A3) and (A4), we can rearrange the terms in

Eq. (A1) to complete the square for x, and obtain

LLG?x,r,v? = −

1

2?x − xMAP?TH?v??x − xMAP?

−

1

2?

i

?ri

T?−1?ri+

1

2?

ij

Xi

TCx?v?Xj+ const.,

?A5?

where Cx?v?=H−1?v? is the posterior covariance over the

fixed image, and we defined the mean-adjusted response

?ri?ri−biand the prefiltered response

Xi? Ki,v

T?−1?ri.

?A6?

The marginalization in Eq. (A2) is thus a standard

Gaussian integration, which yields

log pLG?r?v? =

1

2?

ij

Xi

TCx?v?Xj−

1

2log?CxH?v?? + const.

?A7?

(the constant term is independent of v, and therefore ir-

relevant for estimating it). The decomposition into the

two terms on the right-hand side of Eq. (A7) is similar to

that in Eq. (18). In both equations the second term arose

from a Gaussian integration over x [an approximation in

the case of Eq. (18)], and the first was (up to a constant in

v) the value of the logarithm of the joint distribution of x

and r, given v, at xMAP?r,v?. Unlike Eq. (18), however, al-

though the second term on the right-hand side of Eq. (A7)

depends on v, it is nevertheless independent of the ob-

served response r. The only term that modulates the ve-

locity posterior depending on r (through the implicit de-

pendence of Xi) is the first, which we denote by ELG?v,r?.

We will see that this term corresponds closely to the en-

ergy function introduced in [4]. More explicitly, we have

ELG?v,r? ?

1

2?

ij??Xi?n1?Cx?n1,n2;v?Xj?n2?d2n1d2n2.

ij

Xi

TCx?v?Xj

=

1

2?

?A8?

In the following we will rewrite Eq. (A8) in a form

which is explicitly akin to Eq. (21). For simplicity, we as-

sume that the noise covariance is white, i.e., ?=?21.

Physiologically, this implies

stimulus-conditional correlations and history depen-

dences in the network (as, e.g., in the uncoupled model

discussed in [7]). From Eq. (A6) and the definition of Ki,v,

Eq. (7), we then obtain the explicit form

?2?dt?d?ki?t − ?,?v + n??ri?t?.

that we areignoring

Xi?n? =

1

?A9?

If we further assume that the spike train observation has

not revealed much information about the identity of the

fixed image (as happens, e.g., for low contrasts or short

B38J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.

Page 15

presentation times), then the posterior distribution over x

will not be very different from the prior px?x?. Therefore,

we can use the approximation Cx?v??Cx. In the one-

dimensional case, which we are studying in this paper,

the image profile x?n?, and hence the prior image covari-

ance, depend only on the component of n parallel to the

direction of motion v ˆ =v/?v? and are constant in the per-

pendicular direction. Denoting the former component by

n?=n·v ˆ? and the latter by n??=n−nv ˆ?, we can then per-

fom the integrals over n?in Eq. (A8) and rewrite it as

ij??X˜i?n1?Cx?n1,n2?X˜j?n2?dn1dn2,

ELG?v,r? =

1

2?

?A10?

X˜i?n? ??Xi?n?dn?=

1

?2?dt?d?k˜i?t − ?,?v + n??ri?t?,

?A11?

where v??v?, and we defined k˜i?t,n???ki?t,n?dn?. For

each cell i, we specify a fixed point nipositioned at its re-

ceptive field center, so that ki?t,ni+?n? vanishes when

??n? gets considerably larger than the size of the receptive

field surround ??1°?. Hence, if we define

qi?t,n? ? k˜i?t,n + ni? =?ki?t,n + ni?dn?,

?A12?

(where ni?ni·v ˆ), qi?t,n? vanishes when ?n??1°; for all

cells, qiare localized (up to the above scale) around the

origin, as opposed to around the position of their respec-

tive receptive field centers along v. In order to make the

comparison with the energy model of Subsection 2.B.2

clearer, we also switch to the time domain (recalling that

space n and time t are linked here via the velocity v); we

define

[equivalently,

+ni?/v?] and rewrite Eq. (A10) by changing the integra-

tion variables from n1?2?to vt1?2?:

ij??R˜i?− t1+

?R˜j?− t2+

R˜i?t??X˜i?ni−vt?

X˜i?n?=R˜i??−n

ELG?v,r? =

1

2v2?

ni

v?Cx?vt1,vt2?

nj

v?dt1dt2.

?A13?

Using Eq. (A11) and the definiton (A12), we write Ri?t1?

explicitly as

R˜i?t1? ? X˜i?ni− vt1?

?2?dt?d?k˜i?t − ?,v? − vt1+ ni??ri?t?

?2?dt?d?qi?t − ?,v?? − t1???ri?t?.

=

1

=

1

?A14?

Exploiting the translation invariance of the prior image

ensemble that dictates Cx?n1,n2?=Cx?n1−n2?, we define Bx

to be the operator square root of Cx, in the sense that

Cx?n1− n2? =?Bx?n1− n?Bx?n2− n?dn.

?A15?

In general, given an explicit form of Cx?n1−n2?, Bxcan be

computed in the Fourier domain by taking the square root

of the power spectrum [29]. In particular, for Cx?n1−n2?

=c2e−?n1−n2?/lcorr, we have Bx?n?=c?2/lcorr??n?e−n/lcorr, where

c is the image contrast, lcorris the correlation length of

typical images in the naturalistic prior ensemble, and ??t?

is the Heaviside step function. In the simulations of Sub-

section 3.A.4 we used this particular form of Cx, as it

yields (for spatial frequencies f larger than the inverse of

the correlation length lcorrbut smaller than the inverse

image pixel size) a power spectrum ?1/f2, as observed in

natural images. Substituting definition (A15) (after re-

naming the integration variable n to vt) in Eq. (A13), we

rewrite the latter as

ij???R˜i?− t1+

?Bx?v?t2− t??R˜j?− t2+

ij???R˜i?t1?Bx?v?t +

?Bx?v?t +

v

ELG?v,r? =

1

2v?

ni

v?Bx?v?t1− t??

v?dt1dt2dt

nj

=

1

2v?

ni

v

− t1??

nj

− t2??R˜j?t2?dt1dt2dt.

?A16?

We derived the last line by renaming the integration vari-

ables as t1→ni/v−t1, t2→nj/v−t2, and t→−t. Finally, de-

fining

?v?Bx?v?t − t1??R˜i?t1?dt1,

Ri?t? ?

1

?A17?

we obtain

ELG?v,r? =

1

2?

ij?Ri?t +

ni

v?Rj?t +

nj

v?dt.

?A18?

Equation (A18) is akin to the energy function used in

[4], and together with Eq. (A7) yields Eq. (24) of Subsec-

tion 2.B.3. To find the explicit form of the smoothing filter

in Eq. (25), we compare that equation, in the form

Ri?t? =?wLG?t − t???ri?t??dt?,

?A19?

with definition (A17):

Ri?t? =

1

?2?v?dt1?dt??d?Bx?v?t − t1??

?qi?t? − ?,v?? − t1???ri?t??,

=

1

?2?v?dt1?dt??d?Bx?v?t − t? − t1??

?qi?− ?,v?? − t1???ri?t??,

?A20?

[where we used Eq. (A14) to write the first line, and

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB39

Page 16

shifted ? and t1by t? to derive the second], and obtain

?2?v?dt1?d?Bx?v?t − t1??qi?− ?,v?? − t1??.

wLG?t? =

1

?A21?

Thus, Ri?t? is a version of the response function of the

cell i offset by its baseline log firing rate biand smoothed

out on the time scale dictated by the largest of the spa-

tiotemporal scales of the receptive fields (via qi) or the cor-

relation length of typical images (via Bx)—with spatial

scales converted to time scales by dividing by v. To see

this more precisely, let us define ??1??, ??2?t1−?, and

??3?t−t1, such that t=??1+??2+??3. On the other hand,

because of the finite support of the factors of its inte-

grand, the double integral Eq. (A21) receives nonzero con-

tributions only when ???1???k, ???2??lk/v, and ???3?

?lcorr/v [where ?kand lkare the typical temporal and spa-

tial size of the receptive field filters ki?t,n?, respectively,

and lcorris the correlation length of typical images in the

naturalistic prior ensemble]. Thus if ?t?=??1+??2+??3? is

much larger than the sum of the three scales ?k, lk/v, and

lcorr/v, the filter w?t? is bound to vanish. This leads to the

discussion of Subsection 2.B.3 following Eq. (25).

APPENDIX B: O„D… DECODING

Here, we discuss how to implement the Newton–Raphson

optimization algorithm such that it finds the maximum a

posteriori estimate for the image xMAP[satisfying Eq.

(19)] in cases where the image depends only on one spa-

tial dimension and in a computational time that scales

only linearly with the spatial dimensionality of the image

vector d. The Newton–Raphson algorithm for minimizing

the function L?x? works as follows [we have in mind the

objective function defined in Eq. (13), but for simplicity we

drop r and v from its arguments in this appendix]. At

each iteration of this algorithm, starting from the vector

x, we change this vector by an amount ?x which is found

by solving the set of linear equations H?x??x=?xL?x?.

Here, the right-hand side is the gradient of L?x?, and H?x?

is its negative Hessian matrix [as in Eq. (15)], both evalu-

ated at x. In general, the solution of a set of d linear equa-

tions can be calculated in O?d3? elementary operations

[47]. This would make the decoding of images with even

moderate angular extension forbidding. Fortunately, as

we will now explain, the quasi-locality of the GLM model

allows us to overcome this limitation. The negative Hes-

sian of L?x?, Eq. (13), is given by

H?x? = Cx

−1+?

i,t

Ji,t?x?,

?B1?

where the matrices Ji,t?x? have the elements

Ji,t?n1,n2;x? = Ki,v?t;n1?Ki,v?t;n2??i?t;x?dt,

and Ki,v?t;n? was defined in Eq. (7) in terms of the recep-

tive field filter of the cell i, ki??,n? [as we are considering

the one-dimensional image case, ki??,n? is understood to

be the full receptive field integrated along the transverse

spatial dimension]. Here, we turned the integral over t in

Eq. (13) into a discrete sum in Eq. (B1), as is done in the

?B2?

numerical implementation, and for simplicity we wrote

Eq. (B2) for the case of exponential GLM nonlinearity.

Generalization of this equation and the rest of the argu-

ment to general nonlinearities is straightforward. Be-

cause of the finite spatial size of the receptive field and

the finite duration of the temporal filter, ki??,n? is nonzero

only when ???0,Tk? and n??nmin

?n?nmax

−nmin

are upper bounds on the cells’ temporal

integration windows and the size of the cells’ receptive

field surrounds, respectively. It follows then from Eq. (7)

that

Ki,v?t;n?

+vTk(we assumed v?0, but generalization to v?0 is

straightforward). Thus Ji,t?n1,n2;x? vanishes if ?n1−n2?

??n+vTk, regardless of i and t. In other words, for all

?i,t?, Ji,t?x? are banded matrices with a band width of

?n+vTk, and so is their sum. If we further use a prior co-

variance Cxwith a banded inverse, then the full Hessian

Eq. (B1) will be banded [e.g., the naturalistic prior cova-

riance introduced after Eq. (A15) can be defined as the in-

verse of a tridiagonal matrix in the numerical implemen-

tation].

Unlike in the general case, the solution of a set of d lin-

ear equations with a banded equation matrix of band

width B can be found in a computational time ?B2d—i.e.,

in our case, in a computational time scaling only linearly

(as opposed to cubically) with the image size d. On the

other hand, we have observed empirically that the num-

ber of necessary Newton–Raphson iterations is more or

less constant and does not scale with d. Hence the overall

optimization procedure for finding xMAPcan be performed

in O?d? computational time. This allows us to decode the

velocity of large moving images. Similar methods with

O?d? computational cost have been used in inference and

estimation problemsinvolving

[48,49], e.g., in applications to neural data analysis [50].

i

,nmax

i

?, where Tkand

i

i

vanishesunless

nmin

i

−vt?n?nmax

i

−vt

state–space models

ACKNOWLEDGMENTS

Thanks to E. P. Simoncelli for very detailed comments on

the manuscript, to J. Pillow for providing us with the pa-

rameters for the network model introduced in [7], and to

D. Pfau and E. J. Chichilnisky for many useful comments.

YA and LP are partially supported by NEI grant R01

EY018003 and by a McKnight Scholar award to LP. YA is

additionally supported by a Patterson Trust Fellowship in

Brain Circuitry. EL is supported by an Irish Research

Council for Science, Engineering and Technology (IRC-

SET) Government of Ireland Postdoctoral Research Fel-

lowship.

REFERENCES

1.M. Meister, L. Lagnado, and D. Baylor, “Concerted

signaling by retinal ganglion cells,” Science 270, 1207–1210

(1995).

2.S. Nirenberg, S. Carcieri, A. Jacobs, and P. Latham,

“Retinalganglion cells

encoders,” Nature 411, 698–701 (2002).

3.E. Chichilnisky and R. Kalmar, “Functional asymmetries

in ON and OFF ganglion cells of primate retina,” J.

Neurosci. 22, 2737–2747 (2002).

actlargelyasindependent

B40J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009 Lalor et al.

Page 17

4. E. Frechette, A. Sher, M. Grivich, D. Petrusca, A. Litke,

and E. Chichilnisky, “Fidelity of the ensemble code for

visual motion in the primate retina,” J. Neurophysiol. 94,

119–135 (2005).

E. Schneidman, M. Berry, R. Segev, and W. Bialek, “Weak

pairwise correlations imply strongly correlated network

states in a neural population,” Nature 440, 1007–1012

(2006).

J. Shlens, G. Field, J. Gauthier, M. Grivich, D. Petrusca, A.

Sher, A. Litke, and E. Chichilnisky, “The structure of

multi-neuronfiring patterns

Neurosci. 26, 8254–8266 (2006).

J. Pillow, J. Shlens, L. Paninski, A. Sher, A. Litke, E.

Chichilnisky,and E. Simoncelli,

correlations and visual signalling in a complete neuronal

population,” Nature 454, 995–999 (2008).

E. S. Frechette, M. I. Grivich, R. S. Kalmar, A. M. Litke, D.

Petrusca, A. Sher, and E. J. Chichilnisky, “Retinal motion

signals and limits on speed discrimination,” J. Vision 4, 570

(2004).

A. Litke, N. Bezayiff, E. Chichilnisky, W. Cunningham, W.

Dabrowski, A. Grillo, M. Grivich, P. Grybos, P. Hottowy, S.

Kachiguine, R. Kalmar, K. Mathieson, D. Petrusca, M.

Rahman, and A. Sher, “What does the eye tell the brain?:

Development of a system for the large-scale recording of

retinal output activity,” IEEE Trans. Nucl. Sci. 51,

1434–1440 (2004).

R. Segev, J. Goodhouse, J. Puchalla, and M. Berry,

“Recording spikes from a large fraction of the ganglion cells

in a retinal patch,” Nat. Neurosci. 7, 1154–1161 (2004).

D. Knill and W. Richards, eds., Perception as Bayesian

Inference (Cambridge Univ. Press, 1996).

E. P. Simoncelli, “Distributed analysis and representation

of visual motion,” Ph.D. thesis (Department of Electrical

Engineeringand Computer

Institute of Technology, Cambridge, Massachusetts, 1993).

Also available as MIT Media Laboratory Vision and

Modeling Technical Report #209.

D. Ascher and N. Grzywacz, “A Bayesian model for the

measurement of visual velocity,” Vision Res. 40, 3427–3434

(2000).

Y. Weiss, E. Simoncelli, and E. Adelson, “Motion illusions

as optimal percepts,” Nat. Neurosci. 5, 598–604 (2002).

E. P. Simoncelli, “Local analysis of visual motion,” in The

Visual Neurosciences, L. M. Chalupa and J. S. Werner, eds.

(MIT Press, 2003), Chap. 109, pp. 1616–1623.

A. Stocker and E. Simoncelli, “Noise characteristics and

prior expectations in human visual speed perception,” Nat.

Neurosci. 9, 578–585 (2006).

A. E. Welchman, J. M. Lam, and H. H. Bulthoff, “Bayesian

motion estimation accounts for a surprising bias in 3D

vision,” Proc. Natl. Acad. Sci. U.S.A. 105, 12087–12092

(2008).

F. Hurlimann, D. Kiper, and M. Carandini, “Testing the

Bayesian model of perceived speed,” Vision Res. 42,

2253–2257 (2002).

P. Thompson, K. Brooks, and S. Hammett, “Speed can go up

as well as down at low contrast: Implications for models of

motion perception,” Vision Res. 46, 782–786 (2005).

A. Thiel, M. Greschner, C. Eurich, J. Ammermüller, and J.

Kretzberg, “Contribution of individual retinal ganglion cell

responsesto velocityand

Neurophysiol. 98, 2285–2296 (2007).

J. Kretzberg, I. Winzenborg, and A. Thiel, “Bayesian

analysis of the encoding of constant and changing stimulus

velocities by retinal ganglion cells,” presented at Frontiers

in Neuroinformatics 2008, Stockholm, September 7–9,

2008.

D. Brillinger, “Maximum likelihood analysis of spike trains

of interacting nerve cells,” Biol. Cybern. 59, 189–200

(1988).

P. McCullagh and J. Nelder, Generalized Linear Models

(Chapman & Hall, 1989).

5.

6.

in primate retina,”J.

7.

“Spatio-temporal

8.

9.

10.

11.

12.

Science, Massachusetts

13.

14.

15.

16.

17.

18.

19.

20.

accelerationencoding,”J.

21.

22.

23.

24.L. Paninski, “Maximum likelihood estimation of cascade

point-process neural encoding models,” Network Comput.

Neural Syst. 15, 243–262 (2004).

W. Truccolo, U. Eden, M. Fellows, J. Donoghue, and E.

Brown, “A point process frame-work for relating neural

spiking activity to spiking history, neural ensemble and

extrinsic covariate effects,” J. Neurophysiol. 93, 1074–1089

(2005).

L. Paninski, J. Pillow, and J. Lewi, “Statistical models for

neural encoding, decoding, and optimal stimulus design,”

in

ComputationalNeuroscience:

Research, P. Cisek, T. Drew, and J. Kalaska, eds. (Elsevier,

2007).

D. Snyder and M. Miller, Random Point Processes in Time

and Space (Springer-Verlag, 1991).

D. Field, “Relations between the statistics of natural

images and the response profiles of cortical cells,” J. Opt.

Soc. Am. A 4, 2379–2394 (1987).

D. H.Brainard, D. R.

“Trichromatic reconstruction from the interleaved cone

mosaic: Bayesian model and the color appearance of small

spots,” J. Vision 8, 1–23 (2008).

R. Kass and A. Raftery, “Bayes factors,” J. Am. Stat. Assoc.

90, 773–795 (1995).

E. Brown, L. Frank, D. Tang, M. Quirk, and M. Wilson, “A

statistical paradigm for neural spike train decoding applied

to position prediction from ensemble firing patterns of rat

hippocampal place cells,” J. Neurosci. 18, 7411–7425

(1998).

W. Bialek and A. Zee, “Coding and computation with neural

spike trains,” J. Stat. Phys. 59, 103–115 (1990).

S. Koyama andS.Shinomoto,

interpretations of random point events,” J. Phys. A 38,

531–537 (2005).

J. Pillow, Y. Ahmadian, and L. Paninski, “Model-based

decoding,informationestimation,

detection in multi-neuron spike trains,” submitted to

Neural Comput.

Y. Ahmadian, J. Pillow, and L. Paninski, “Efficient Markov

chain Monte Carlo methods for decoding neural spike

trains,” submitted to Neural Comput.

E. Adelson and J. Bergen, “Spatiotemporal energy models

for the perception of motion,” J. Opt. Soc. Am. A 2, 284–99

(1985).

E. Chichilnisky and R. Kalmar, “Temporal resolution of

ensemble visual motion signals in primate retina,” J.

Neurosci. 23, 6681–6689 (2003).

W. Bialek (Princeton University, bbrinker@princeton.edu)

and R. de Ruyter van Steveninck (Indiana University,

deruyter@indiana.edu) (personal communication, 2003).

V. Perry and A. Cowey, “The ganglion cell and cone

distributions in the monkey’s retina: implications for

central magnification factors,” Vision Res. 25, 1795–1810

(1985).

S. Ullman, The Interpretation of Visual Motion (MIT Press,

1979).

P. Thompson, “Perceived rate of movement depends on

contrast,” Vision Res. 22, 377–380 (1982).

L. Stone and P. Thompson, “Human speed perception is

contrast dependent,” Vision Res. 32, 1535–1549 (1992).

D. C. Bradley and M. S. Goyal, “Velocity computation in the

primate visual system,” Nat. Rev. Neurosci. 9, 686–695

(2008).

M. Potters and W. Bialek, “Statistical mechanics and visual

signal processing,” J. Phys. I France 4, 1755–1775 (1994).

S. McKee, G. Silvermann, and K. Nakayama, “Precise

velocity discriminationdespite

temporal frequency and contrast,” Vision Res. 26, 609–619

(1986).

M. Blakemore and R. Snowden, “The effect of contrast upon

perceived speed: a general phenomenon?” Perception 28,

33–48 (1999).

W. Press, S. Teukolsky, W. Vetterling, and B. Flannery,

Numerical Recipes in C (Cambridge Univ. Press, 1992).

25.

26.

Progressin Brain

27.

28.

29.Williams, andH. Hofer,

30.

31.

32.

33.“EmpiricalBayes

34.

andchange-point

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

random variationsin

46.

47.

Lalor et al.

Vol. 26, No. 11/November 2009/J. Opt. Soc. Am. AB41

Page 18

48. N. Shephard and M. Pitt, “Likelihood analysis of non-

Gaussian measurement time

653–667 (1997).

R. Davis and G. Rodriguez-Yam, “Estimation for state-

space models: an approximate likelihood approach,” Stat.

Sin. 15, 381–406 (2005).

series,” Biometrika

84,

49.

50. L. Paninski, Y. Ahmadian, D. Ferreira, S. Koyama, K.

Rahnama, M. Vidne, J. Vogelstein, and W. Wu, “A new look

at state-space models for neural data,” J. Comput.

Neurosci. (to be published). Epub ahead of print, doi

10.1007/s10827-009-0179-x.

B42 J. Opt. Soc. Am. A/Vol. 26, No. 11/November 2009Lalor et al.