Page 1

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 6, JUNE 2011 1507

A Point Process Model for Auditory Neurons

Considering Both Their Intrinsic Dynamics and the

Spectrotemporal Properties of an Extrinsic Signal

Eric Plourde*, Bertrand Delgutte, and Emery N. Brown

Abstract—We propose a point process model of spiking activity

from auditory neurons. The model takes account of the neuron’s

intrinsic dynamics as well as the spectrotemporal properties of

an input stimulus. A discrete Volterra expansion is used to de-

rive the form of the conditional intensity function. The Volterra

expansion models the neuron’s baseline spike rate, its intrinsic

dynamics—spiking history—and the stimulus effect which in this

case is the analog of the spectrotemporal receptive field (STRF).

We performed the model fitting efficiently in a generalized linear

model framework using ridge regression to address properly this

ill-posed maximum likelihood estimation problem. The model pro-

vides an excellent fit to spiking activity from 55 auditory nerve

neurons. The STRF-like representation estimated jointly with the

neuron’s intrinsic dynamics may offer more accurate characteri-

zations of neural activity in the auditory system than current ones

based solely on the STRF.

Index Terms—Auditory system, generalized linear model, point

process, spectrotemporal receptive field, spike train model.

I. INTRODUCTION

U

investigation in neuroscience. One approach is to fit statistical

models containing the most salient factors to neural spiking ac-

tivityandusethefittedmodeltoevaluatetherelativeimportance

of the factors. Two key factors or covariates to consider in stan-

dard neurophysiology experiments are the intrinsic dynamics of

the neuron such as the absolute and relative refractory periods,

NDERSTANDING the factors that are responsible for in-

ducing neurons to spike is an important, active topic of

ManuscriptreceivedOctober18,2010;revisedDecember28,2010;accepted

January 5, 2011. Date of publication February 10, 2011; date of current version

May 18, 2011. This work was supported in part by the Fonds qu´ eb´ ecois de la

recherche sur la nature et les technologies, and in part by the National Insti-

tutes of Health under Grant DP1-OD003646. Asterisk indicates corresponding

author.

∗E. Plourde is with the Neuroscience Statistics Research Laboratory,

Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114

USA, and also with the Department of Brain and Cognitive Sciences, MIT,

Cambridge, MA 02139 USA (e-mail: eplourde@mit.edu).

B. Delgutte is with the Department of Otology and Laryngology,

Harvard Medical School, Boston, MA 02114 USA, and also with the

Harvard−MIT Division of Health Science and Technology and the Re-

search Laboratory of Electronics, MIT, Cambridge, MA 02139 USA (e-mail:

Bertrand_Delgutte@meei.harvard.edu).

E. N. Brown is with the Neuroscience Statistics Research Laboratory,

Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114

USA, and also with the Harvard−MIT Division of Health Science and Tech-

nology and the Department of Brain and Cognitive Sciences, MIT, Cambridge,

MA 02139 USA (e-mail: enb@neurostat.mit.edu).

Digital Object Identifier 10.1109/TBME.2011.2113349

and bursting and network dynamics whereas the primary ex-

trinsic factor is the external stimulus applied in the experiment.

In an auditory experiment, the stimulus is typically a specific

sound pattern.

The intrinsic dynamics modeled typically in terms of the

neuron’s past spiking history has been established as an impor-

tant descriptor of spiking propensity in a number of neural sys-

tems[1]–[5].Analysesofauditoryneuronshavefocusedoncon-

structingaspectrotemporalreceptivefield(STRF)byestimating

alinearrelationbetweenaspectrotemporalrepresentationofthe

auditory stimulus and the rate function of the neuron [6]. The

coefficients of the linear model comprise the STRF. To date

no statistical model has characterized the spiking propensity of

auditory neurons by representing simultaneously the intrinsic

dynamics and the complete spectrotemporal representation of

the auditory stimulus. Given the point process nature of neu-

ral spiking activity, a principled approach to constructing the

modelwouldbetorelatethecovariatestothespikingpropensity

of the neuron in terms of the conditional intensity (rate) func-

tion (CIF) because a point process is completely described by

its CIF.

We present a point process model for auditory neural spiking

activity that considers both the neuron’s intrinsic dynamics and

the spectrotemporal properties of the auditory stimulus. We for-

mulate the log of the conditional intensity function in terms of a

discrete-time Volterra series expansion of the neuron’s spiking

history and the spectrotemporal decomposition of the auditory

stimulus. The Volterra expansion contains a parameter repre-

senting the baseline spike rate, a set describing the intrinsic

dynamics, and a second set characterizing the stimulus effect,

the analog of the STRF. Using the generalized linear model

(GLM) in a ridge regression framework to address properly the

ill-posed inverse nature of this maximum likelihood estimation

problem we illustrate our approach by fitting the model to the

spiking activity of 55 auditory nerve neurons in an anesthetized

cat in response to an auditory stimulus.

This letter comprises the following sections: the proposed

statistical model is derived in Section II; the model estimation

is presented in Section III; results, including goodness of fit

of the model and model parameter analyses, are presented in

Section IV; and Section V concludes the study.

II. STATISTICAL MODEL

Given an observation interval (0,T] and spike times 0 <

u1,u2,...,< uN (T )< T. The CIF of the spike train is defined

0018-9294/$26.00 © 2011 IEEE

Page 2

1508 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 6, JUNE 2011

by [1]

lim

Δ→0

Pr{[N(t + Δ) − N(t)] = 1|Ht}

Δ

where N(t) is the number of spikes in the interval (0,t] for

t ∈ (0,T] and Htis the relevant history of the covariates at t. It

follows that for Δ small:

= λ(t|Ht)

(1)

Pr(spike in(t,t + Δ]|Ht) ≈ λ(t|Ht)Δ.

TheCIFisforapointprocessahistory-dependentgeneralization

of the rate function of a Poisson process. To obtain a discrete

formulation of the CIF we choose K sufficiently large so that

each subinterval Δ = K−1T contains at most one spike. We

index the subintervals k = 1,...,K and define nk to be 1 if

there is a spike in the subinterval ((k − 1)Δ,kΔ] and 0 if there

is no spike. For our analysis, we choose K so that Δ is 1 ms,

consistent with the absolute refractory period of a neuron.

Let sk,j be the value of a spectrotemporal representation

of the sound stimulus with frequency band j at time kΔ for

j = 1,...,J. Define the relevant history of the sound stim-

ulus for predicting the current spiking propensity as Hk,j=

{sk,j,...,sk−L,j}, assuming a dependence that goes back L

time periods. Similarly, define the relevant spiking history

for predicting the current spiking propensity as Hk,J+1=

{nk−1,...,nk−P}, assuming a dependence that goes back P

time periods. Let Hk= {Hk,1,...,Hk,J+1}. If we assume that

there is a functional F that describes the relation between Hk

and the CIF λ(kΔ|Hk) then we can expand the log of the CIF

in a discrete Volterra series as [7]

(2)

log(λ(kΔ|Hk,β)) = F(Hk,β) = β0+

J

?

j=1

L−1

?

l=0

βl,jsk−l,j

+

P

?

p=1

βp,J+1nk−p+ higher terms(3)

where β = {β0,β0,1,...,βL−1,J,β1,J+1,...,βP,J+1} is the

(JL + P + 1) × 1 vector of Volterra kernels. We interpret the

Volterra series expansion as the sum of the outputs of J + 1

linear filters having Volterra kernels as the impulse responses.

The kernels βl,jare the analogs of the STRFs used to charac-

terize auditory neurons. The kernel βp,J+1 models the effect

of the spiking history and β0 governs the mean spiking rate.

Exponentiating both sides of (3) yields the CIF

⎧

λ(kΔ|Hk,β) = exp

⎨

⎩β0+

+

J

?

j=1

L−1

?

l=0

βl,jsk−l,j

P

?

p=1

βp,J+1nk−p

?

(4)

where we have neglected the higher order terms.

Constructing the Volterra series expansion in terms of the log

of the discrete CIF ensures that the CIF is nonnegative and that

the relation between the spiking activity and the sound stimulus

and the spiking history can be modeled using a generalized

linear model (GLM) with either a binomial or a Poisson link

Fig. 1.

auditory nerve recordings. Dashed lines indicate the 95% confidence bound.

KS plots (black lines) of (a) the best and (b) worst fits from a set of 55

TABLE I

OCCURENCES OF NORMALIZED KS STATISTIC VALUESˆD FOR THE COMPLETE

55 AUDITORY NERVE DATASET OBTAINED USING FOURFOLD CROSS

VALIDATION

function [3]. Models with a similar structure as the one derived

in (4) have been considered previously for other neural systems

[1]–[5]. None of these analyses was motivated by a Volterra

series expansion nor has this model been used in an analysis

of auditory spiking activity in which the intrinsic dynamics and

the STRF were simultaneously estimated.

III. MODEL ESTIMATION

We can rewrite (3) in a more compact form as

log(λ) = Xβ

(5)

where X is the RK × (JL + P + 1) matrix of covariates, R

has been added to take account of the number of trial and the

logarithm function is applied element wise. It follows that the

log likelihood function for estimating β is [1]–[4]

L(β) =

K

?

k=1

R

?

r=1

nk,rlog(λk|β) −

K

?

k=1

R

?

r=1

(λk|β)Δ

(6)

where the inner sum is over trials. An advantage of (5) is that

it shows that (6) is equivalent to a GLM with a Poisson log

likelihood function. We can, therefore, use the GLM framework

to estimate β.

ItiswellknownthattheFisherscoringalgorithmforGLMpa-

rameter estimation with canonical link functions can be solved

by iteratively reweighted least-squares (IRLS) [8]. In the IRLS

algorithm, the maximum likelihood estimate of β is computed

iteratively by solving successive weighted least squares (WLS)

subproblems. The WLS subproblems are solved using a conju-

gate gradient algorithm that greatly reduces the computational

time. Our problem has an additional feature that must be con-

sidered. The design matrix X includes 1 ms delayed version of

the input spectrotemporal representations. As a consequence,

many of the columns of X are highly correlated, especially

when the auditory stimulus is presented across a large number

of trials. This suggests that the estimation of β is an ill-posed

Page 3

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 6, JUNE 2011 1509

Fig. 2.

of eβp ,J + 1, with 95% confidence interval, versus a latency p up to P = 40 ms. (c) Example of exponentiated βl,jvalues displayed according to the center

frequency of the corresponding gammatone filter j and latency l (CF = 1024 Hz) with J = 25 and L = 10 ms. (d) Scatter plot of the center frequency of the filter

corresponding to the βl,jwith the highest value versus the measured characteristic frequency of the corresponding neuron; ρ = 0.82.

CIF parameter values. (a) Scatter plot of exp(βo) versus the measured baseline firing rate with Pearson correlation coefficient ρ = 0.92. (b) Mean

inverseproblemthatcanbesolvedbyregularization.Wecanes-

timate β using the truncated regularized iteratively reweighted

least-squares (TR-IRLS) algorithm for GLM parameter estima-

tion [8]. The TR-IRLS is a variant of the IRLS algorithm in

which a ridge parameter can be included in each WLS subprob-

lem to provide a quadratic regularization.

Introducing the regularization term τ the estimate of β at

iteration i + 1 of the TR-IRLS is computed as

ˆβi+1= (XTWiX + τI)−1XTWizi

(7)

where Wiis an RK × RK diagonal weight matrix whose di-

agonal elements areλ λi,thevector ofCIFestimatesatiterationi,

zi= Xˆβi+ W−1

covariate[8]andnisthecolumnvectorofallofthenk,rspiking

activity across all trials and across all times in the experiment.

i(n − λ λi) is the so-called adjusted dependent

IV. RESULTS

We applied the proposed statistical model to neural spiking

activity recorded in the auditory nerves of anesthetized cats fol-

lowing the presentation of the input sentence “wood is best for

making toys and blocks” spoken by a male voice and sampled

at 10 kHz [9]. The dataset was composed of the spike train

responses of 55 distinct neurons each recorded across R = 20

trials. The spectrotemporal representation of the input speech

was obtained through a modified version of an auditory spec-

trogram [10] that applied a gammatone filterbank to the input

speech signal. The bandwidths of the filters were modified ac-

cordingto[11]torepresentadequatelytheprocessingperformed

by the cat’s cochlea. To obtain comparable frequency bins, each

was normalized by its Euclidean norm over the entire time do-

main. We used P = 40, L = 10, and J = 25, where the center

frequencies ranged from 20 Hz to 4.4 kHz, and τ = 0.1 was the

regularization parameter in the TR-IRLS algorithm.

A. Model Goodness of Fit

To evaluate the model goodness-of-fit, we used the time

rescaling theorem with rescaled times computed from the es-

timated CIF [3]. If the latter is a good approximation to the true

CIF of the point process, then the rescaled times will be inde-

pendentanduniformlydistributedontheinterval[0,1).Weused

Kolmogorov–Smirnov(KS)plotstoassesstheuniformityofthe

rescaled times and the autocorrelation function (ACF) of trans-

formed rescaled times to assess their independence [3], [12].

Fig.1showstheKSplotsofthebestandworstfitsinwhichthe

upperandlower95%confidenceboundsareindicatedbydashed

lines.Ifthemodelisaccurate,theKSplotshouldshowanempir-

ical distributionˆF(x) versus the fitted cumulative distribution

F(x)thatliesalongthe45◦line.AsshowninFig.1(a),thecurve

for the best fit is indistinguishable from the 45◦line indicating

a close model fit. Even for the worst case [see Fig. 1(b)], the

fitting is also quite good, with the KS plot being always within

or extremely close to the confidence bounds.

To evaluate the fitting in the entire dataset, we used the nor-

malized KS statistic that is defined as

?????

ˆD < 1, thus, indicates that the entire KS plot is within the 95%

confidence bound. Table I shows the number of occurrences of

normalized KS statistics values determined using fourfold cross

validation. Fortyfour of the 55 (80%)ˆD values were less than

1 that supports the excellent fit of the models. Most of theˆD

values that were greater than 1 were less than 1.05, suggesting

that these models as well were in close agreement with the data.

The ACF of the transformed rescaled times were also eval-

uated to assess their independence. It was observed (results

not shown here) that the rescaled times were highly indepen-

dent with 92.5% of the ACF values in the entire auditory nerve

dataset(n = 55)beinginsidetheirrespectiveconfidencebounds

for lags up to 100 ms. The findings from the KS and ACF anal-

yses suggest that the estimated CIFs provide excellent approxi-

mations to the true CIFs.

ˆD = sup

ˆF(x) − F(x)

ˆB

?????

(8)

whereˆB is the width of 95% confidence bound. A value of

B. Model Parameter Analyses

Wepresenttheestimatedvaluesofthemodelparameters,i.e.,

the Volterra kernels β0, βp,J+1, and βl,jobtained using the 20

trials for each neuron recording.

Fig.2(a)showsascatterplotoftheestimatedbaselineparam-

eter exp(β0) versus the baseline firing rate measured in [9]. As

can be observed, there is a strong correlation between exp(β0)

and the measured baseline firing rate (Pearson correlation coef-

ficient ρ = 0.92).

Page 4

1510 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 6, JUNE 2011

The mean values of the exponentiated history parameters

βp,J+1 with the error bars of the 95% confidence interval is

plotted in Fig. 2(b). This plot shows that these parameters ac-

curately capture the refractory behavior of the neurons because

the mean value of exp(βp,J+1) is appreciably less than 1.

Thevaluesoftheexponentiatedstimulusparametersβl,jfora

neuron with a characteristic frequency of 1024 Hz are shown in

Fig.2(c).ThisrepresentationresemblescloselythatofanSTRF.

It has a well- defined preferred spectrotemporal region that is

fairly restricted for auditory nerve recordings. Moreover, the

center frequency of the filter with the highest parameter value

(1140Hz)corresponds tothemeasuredcharacteristicfrequency

(1024 Hz). We plot in Fig. 2(d) the center frequency l of the

filter corresponding to the spectrotemporal parameter βl,jwith

the highest value as a function of the measured characteristic

frequency for the entire dataset. There is a high correlation

between the two (ρ = 0.82) indicating a good description of the

CF by the model.1

V. CONCLUSION

We presented a point process model of neural spiking activity

intheauditorysystemthattakesaccountofboththeneuron’sin-

trinsic dynamics and the spectrotemporal properties of an input

sound stimulus. We derived the associated CIF using a discrete

Volterra expansion formulated in terms of the spiking history

and the spectrotemporal components. This CIF model makes

it possible to assess the relative importance of the neuron’s

intrinsic dynamic (spiking history) and the STRF (spectrotem-

poral components). We fit the model to actual auditory nerve

spiking activity by regularized maximum likelihood estimation.

The models gave accurate descriptions of neural spiking activ-

ity in terms of KS goodness-of-fit analyses. This model, which

considers both the neuron’s intrinsic dynamics and the STRF,

may offer a way of obtaining more accurate characterizations of

activity in other parts of the auditory system.

1Additional examples of exponentiated βl,jvalues as well as estimated CIF

for different neurons from the dataset can be found at http://www.neurostat.

mit.edu/publications.

ACKNOWLEDGMENT

The authors would like to thank R. Haslinger and D. Ba

for kindly providing, respectively, their KS plot and TR-IRLS

algorithms for this study.

REFERENCES

[1] E. N. Brown, R. Barbieri, U. T. Eden, and L. M. Frank, “Likelihood meth-

ods for neural spike train data analysis,” in Computational Neuroscience:

AComprehensiveApproach,J.Feng,Ed.London,U.K.:CRCPress,2003,

pp. 253–286.

[2] L. Paninski, “Maximum likelihood estimation of cascade point-process

neural encoding models,” Netw.: Comput. Neural Syst., vol. 15, pp. 243–

262, 2004.

[3] W.Truccolo,U. T. Eden,M.R.Fellows, J. P.Donoghue,andE. N.Brown,

“A point process framework for relating neural spiking activity to spiking

history,neuralensemble,andextrinsiccovariateeffects,” J.Neurophysiol.,

vol. 93, pp. 1074–1089, 2005.

[4] S.V.Sarma,U.T.Eden,M.L.Cheng,Z.M.Williams,R.Hu,E.Eskandar,

and E. N. Brown, “Using point process models to compare neural spiking

activity in the subthalamic nucleus of parkinson’s patients and a healthy

primate,” IEEE Trans. Biomed. Eng., vol. 57, no. 6, pp. 1297–1305, Jun.

2010.

[5] W.Truccolo,L.R.Hochberg,andJ.P.Donoghue,“Collectivedynamicsin

humanandmonkeysensorimotorcortex:Predictingsingleneuronspikes,”

Nature Neurosci., vol. 13, no. 1, pp. 105–111, 2010.

[6] F. E. Theunissen, K. Sen, and A. J. Doupe, “Spectral-temporal receptive

fields of nonlinear auditory neurons obtained using natural sounds,” J.

Neurosci., vol. 20, no. 6, pp. 2315–2331, 2000.

[7] V.Z.Marmarelis,NonlinearDynamicModelingofPhysiologicalSystems.

Piscataway, N.J.: IEEE Press, 2005.

[8] P. Komarek, “Logistic regression for data mining and high-dimensional

classification,” Ph.D. dissertation, Dept. Math. Sci., Carnegie Mellon

Univ., Pittsburgh, PA, 2004.

[9] B. Delgutte, B. M. Hammond, and P. A. Cariani, “Neural coding of

the temporal envelope of speech: Relation to modulation transfer func-

tions,” in Psychophysical and Physiological Advances in Hearing.

R. Palmer, A. Reese, A. Q. Summerfield, and R. Meddis, Eds. London,

U.K.: Whurr, 1998, pp. 595–603.

[10] D. Ellis. (2009). Gammatone-like spectrograms. Web ressource,

[Online].Available:http://www.ee.columbia.edu/∼dpwe/resources/

matlab/gammatonegram/

[11] L. H. Carney and T. C. T. Yin, “Temporal coding of resonances by low-

frequency auditory nerve fibers: Single fiber responses and a population

model,” J. Neurophysiol., vol. 60, pp. 1653–1677, 1988.

[12] R. Haslinger, G. Pipa, and E. N. Brown, “Discrete time rescaling theorem:

Determining goodness of fit for discrete time statistical models of neural

spiking,” Neural Comput., vol. 22, no. 10, pp. 2477–2506, Oct. 2010.

A.