Content uploaded by Shigeru Shinomoto

Author content

All content in this area was uploaded by Shigeru Shinomoto on May 07, 2018

Content may be subject to copyright.

Computational

Neuroscience:

Mathematical and

Statistical Perspectives

Robert E. Kass1, Shun-ichi Amari2, Kensuke

Arai3, Emery N. Brown4,5, Casey O. Diekman6,

Markus Diesmann7,8, Brent Doiron9, Uri T.

Eden3, Adrienne L. Fairhall10, Grant M.

Fiddyment3, Tomoki Fukai2, Sonja Gr¨un7,8,

Matthew T. Harrison11, Moritz Helias7,8,

Hiroyuki Nakahara2, Jun-nosuke Teramae12,

Peter J. Thomas13, Mark Reimers14, Jordan

Rodu1, Horacio G. Rotstein6, Eric

Shea-Brown10, Hideaki Shimazaki15,16, Shigeru

Shinomoto16, Byron M. Yu1, and Mark A.

Kramer3

1Carnegie Mellon University, Pittsburgh, PA, USA, 15213; email:

kass@stat.cmu.edu

2RIKEN Brain Science Institute, Wako, Saitama Prefecture, Japan, 351-0198

3Boston University, Boston, MA, USA, 02215

4Massachusetts Institute of Technology, Cambridge, MA, USA, 02139

5Harvard Medical School, Boston, MA, USA, 02115

6New Jersey Institute of Technology, Newark, NJ, USA, 07102

7J¨ulich Research Centre, J¨ulich, Germany, 52428

8RWTH Aachen University, Aachen, Germany, 52062

9University of Pittsburgh, Pittsburgh, PA, USA, 15260

10University of Washington, Seattle, WA, USA, 98105

11Brown University, Providence, RI, USA, 02912

12Osaka University, Suita, Osaka Prefecture, Japan, 565-0871

13Case Western Reserve University, Cleveland, OH, USA, 44106

14Michigan State University, East Lansing, MI, USA, 48824

15Honda Research Institute Japan, Wako, Saitama Prefecture, Japan, 351-0188

16Kyoto University, Kyoto, Kyoto Prefecture, Japan, 606-8502

1

Xxxx. Xxx. Xxx. Xxx. YYYY. AA:1–37

This article’s doi:

10.1146/((please add article doi))

Copyright c

YYYY by Annual Reviews.

All rights reserved

Keywords

Neural data analysis, neural modeling, neural networks, theoretical

neuroscience.

Abstract

Mathematical and statistical models have played important roles in

neuroscience, especially by describing the electrical activity of neurons

recorded individually, or collectively across large networks. As the ﬁeld

moves forward rapidly, new challenges are emerging. For maximal ef-

fectiveness, those working to advance computational neuroscience will

need to appreciate and exploit the complementary strengths of mecha-

nistic theory and the statistical paradigm.

2 Kass et al.

Contents

1. Introduction .................................................................................................. 3

1.1. The brain-as-computer metaphor....................................................................... 5

1.2. Neurons as electrical circuits............................................................................ 6

1.3. Receptive ﬁelds and tuning curves...................................................................... 7

1.4. Networks ................................................................................................ 7

1.5. Statistical models ....................................................................................... 10

1.6. Recording modalities .................................................................................... 10

1.7. Data analysis ............................................................................................ 12

1.8. Components of the nervous system..................................................................... 12

2. Single Neurons ............................................................................................... 12

2.1. LIF models and their extensions ........................................................................ 13

2.2. Biophysical models...................................................................................... 13

2.3. Point process regression models of single neuron activity ... . ... . .. . ... . ... . .. . ... . ... . .. . ... . .. . ... . .. 14

2.4. Point process regression and leaky integrate-and-ﬁre models .. . .. . ... . ... . .. . ... . ... . .. . ... . ... ... . ... 15

2.5. Multidimensional models................................................................................ 16

2.6. Statistical challenges in biophysical modeling . ... . .. . ... . ... ... . ... . .. . ... . ... . .. . ... . ... ... . ... . .. . ... 16

3. Networks...................................................................................................... 17

3.1. Mechanistic approaches for modeling small networks . . .. . ... . ... . .. . ... . ... ... . ... . .. . ... . ... . .. . ... . . 17

3.2. Statistical methods for small networks . ... . .. . ... . ... . .. . ... . ... ... . ... . .. . ... . ... . .. . ... . ... . .. . ... . .. 19

3.3. Mechanistic models of large networks across scales and levels of complexity ... . ... . .. . ... . ... . .. . ... 19

3.4. Statistical methods for large networks.. . ... ... . ... . .. . ... . ... . .. . ... . ... ... . ... . .. . ... . ... . .. . ... . ... . . 23

3.5. Connecting mathematical and statistical approaches in large networks . . ... . ... ... . ... . .. . ... . ... . .. . 27

4. Outlook....................................................................................................... 28

1. Introduction

Brain science seeks to understand the myriad functions of the brain in terms of principles

that lead from molecular interactions to behavior. Although the complexity of the brain

is daunting and the ﬁeld seems brazenly ambitious, painstaking experimental eﬀorts have

made impressive progress. While investigations, being dependent on methods of measure-

ment, have frequently been driven by clever use of the newest technologies, many diverse

phenomena have been rendered comprehensible through interpretive analysis, which has

often leaned heavily on mathematical and statistical ideas. These ideas are varied, but a

central framing of the problem has been to “elucidate the representation and transmission

of information in the nervous system” (Perkel and Bullock 1968). In addition, new and

improved measurement and storage devices have enabled increasingly detailed recordings,

as well as methods of perturbing neural circuits, with many scientists feeling at once ex-

cited and overwhelmed by opportunities of learning from the ever-larger and more complex

data sets they are collecting. Thus, computational neuroscience has come to encompass not

only a program of modeling neural activity and brain function at all levels of detail and

abstraction, from sub-cellular biophysics to human behavior, but also advanced methods

for analysis of neural data.

In this article we focus on a fundamental component of computational neuroscience,

the modeling of neural activity recorded in the form of action potentials (APs), known as

spikes, and sequences of them known as spike trains (see Figure 1). In a living organism,

www.annualreviews.org •Computational Neuroscience 3

each neuron is connected to many others through synapses, with the totality forming a

large network. We discuss both mechanistic models formulated with diﬀerential equations

and statistical models for data analysis, which use probability to describe variation. Mecha-

nistic and statistical approaches are complementary, but their starting points are diﬀerent,

and their models have tended to incorporate diﬀerent details. Mechanistic models aim to

explain the dynamic evolution of neural activity based on hypotheses about the properties

governing the dynamics. Statistical models aim to assess major drivers of neural activity by

taking account of indeterminate sources of variability labeled as noise. These approaches

have evolved separately, but are now being drawn together. For example, neurons can

be either excitatory, causing depolarizing responses at downstream (post-synaptic) neurons

(i.e., responses that push the voltage toward the ﬁring threshold, as illustrated in Figure 1),

or inhibitory, causing hyperpolarizing post-synaptic responses (that push the voltage away

from threshold). This detail has been crucial for mechanistic models but, until relatively

recently, has been largely ignored in statistical models. On the other hand, during experi-

ments, neural activity changes while an animal reacts to a stimulus or produces a behavior.

This kind of non-stationarity has been seen as a fundamental challenge in the statistical

work we review here, while mechanistic approaches have tended to emphasize emergent

behavior of the system. In current research, as the two perspectives are being combined

increasingly often, the distinction has become blurred. Our purpose in this review is to

provide a succinct summary of key ideas in both approaches, together with pointers to the

literature, while emphasizing their scientiﬁc interactions. We introduce the subject with

some historical background, and in subsequent sections describe mechanistic and statistical

models of the activity of individual neurons and networks of neurons. We also highlight

several domains where the two approaches have had fruitful interaction.

Figure 1

Action potential and spike trains. The left panel shows the voltage drop recorded across a

neuron’s cell membrane. The voltage ﬂuctuates stochastically, but tends to drift upward, and

when it rises to a threshold level (dashed line) the neuron ﬁres an action potential, after which it

returns to a resting state; the neuron then responds to inputs that will again make its voltage

drift upward toward the threshold. This is often modeled as drifting Brownian motion that results

from excitatory and inhibitory Poisson process inputs (Tuckwell 1988; Gerstein and Mandelbrot

1964). The right panel shows spike trains recorded from 4 neurons repeatedly across 3

experimental replications, known as trials. The spike times are irregular within trials, and there is

substantial variation across trials, and across neurons.

4 Kass et al.

1.1. The brain-as-computer metaphor

The modern notion of computation may be traced to a series of investigations in mathe-

matical logic in the 1930s, including the Turing machine (Turing 1937). Although we now

understand logic as a mathematical subject existing separately from human cognitive pro-

cesses, it was natural to conceptualize the rational aspects of thought in terms of logic (as

in Boole’s 1854 Investigation of the Laws of Thought (Boole 1854, p. 1) which “aimed to

investigate those operations of the mind by which reasoning is performed”), and this led

to the 1943 proposal by Craik that the nervous system could be viewed “as a calculating

machine capable of modeling or paralleling external events” (Craik 1943, p. 120) while Mc-

Culloch and Pitts provided what they called “A logical calculus of the ideas immanent in

nervous activity” (McCulloch and Pitts 1943). In fact, while it was an outgrowth of pre-

liminary investigations by a number of early theorists (Piccinini 2004), the McCulloch and

Pitts paper stands as a historical landmark for the origins of artiﬁcial intelligence, along

with the notion that mind can be explained by neural activity through a formalism that

aims to deﬁne the brain as a computational device; see Figure 2. In the same year another

noteworthy essay, by Norbert Wiener and colleagues, argued that in studying any behavior

its purpose must be considered, and this requires recognition of the role of error correction

in the form of feedback (Rosenblueth et al. 1943). Soon after, Wiener consolidated these

ideas in the term cybernetics (Wiener 1948). Also, in 1948 Claude Shannon published his

hugely inﬂuential work on information theory which, beyond its technical contributions,

solidiﬁed information (the reduction of uncertainty) as an abstract quantiﬁcation of the

content being transmitted across communication channels, including those in brains and

computers (Shannon and Weaver 1949).

The ﬁrst computer program that could do something previously considered exclusively

the product of human minds was the Logic Theorist of Newell and Simon (Newell and Simon

1956), which succeeded in proving 38 of the 52 theorems concerning the logical foundations

of arithmetic in Chapter 2 of Principia Mathematica (Whitehead and Russell 1912). The

program was written in a list-processing language they created (a precursor to LISP), and

provided a hierarchical symbol manipulation framework together with various heuristics,

which were formulated by analogy with human problem-solving (Gugerty 2006). It was also

based on serial processing, as envisioned by Turing and others.

A diﬀerent kind of computational architecture, developed by Rosenblatt (Rosenblatt

1958), combined the McCulloch-Pitts conception with a learning rule based on ideas artic-

ulated by Hebb in 1949 (Hebb 1949), now known as Hebbian learning. Hebb’s rule was,

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently

takes part in ﬁring it, some growth process or metabolic change takes place in one or both

cells such that A’s eﬃciency, as one of the cells ﬁring B, is increased” (Hebb 1949), that

is, the strengths of the synapses connecting the two neurons increase, which is sometimes

stated colloquially as, “Neurons that ﬁre together, wire together.” Rosenblatt called his

primitive neurons perceptrons, and he created a rudimentary classiﬁer, aimed at imitating

biological decision making, from a network of perceptrons, see Figure 2. This was the ﬁrst

artiﬁcial neural network that could carry out a non-trivial task.

As the foregoing historical outline indicates, the brain-as-computer metaphor was solidly

in place by the end of the 1950s. It rested on a variety of technical speciﬁcations of the

notions that (1) logical thinking is a form of information processing, (2) information pro-

cessing is the purpose of computer programs, while, (3) information processing may be

implemented by neural systems (explicitly in the case of McCulloch-Pitts model and its de-

www.annualreviews.org •Computational Neuroscience 5

scendents, but implicitly otherwise). A crucial recapitulation of the information-processing

framework, given later by David Marr (Marr 1982), distinguished three levels of analysis:

computation (“What is the goal of the computation, why is it appropriate, and what is

the logic of the strategy by which it can be carried out?”), algorithm (“What is the repre-

sentation for the input and output, and what is the algorithm for the transformation?”),

and implementation (“How can the representation and algorithm be realized physically?”).

This remains a very useful way to categorize descriptions of brain computation.

1.2. Neurons as electrical circuits

A rather diﬀerent line of mathematical work, more closely related to neurobiology, had

to do with the electrical properties of neurons. So-called “animal electricity” had been

observed by Galvani in 1791 (Galvani and Aldini 1792). The idea that the nervous system

was made up of individual neurons was put forth by Cajal in 1886, the synaptic basis of

communication across neurons was established by Sherrington in 1897 (Sherrington 1897),

and the notion that neurons were electrically excitable in a manner similar to a circuit

involving capacitors and resistors in parallel was proposed by Hermann in 1905 (Piccolino

1998). In 1907, Lapique gave an explicit solution to the resulting diﬀerential equation,

in which the key constants could be determined from data, and he compared what is now

known as the leaky integrate-and-ﬁre model (LIF) with his own experimental results (Abbott

1999; Brunel and Van Rossum 2007; Lapique 1907). This model, and variants of it, remain

in use today (Gerstner et al. 2014), and we return to it in Section 2 (see Figure 3). Then, a

series of investigations by Adrian and colleagues established the “all or nothing” nature of

the AP, so that increasing a stimulus intensity does not change the voltage proﬁle of an AP

but, instead, increases the neural ﬁring rate (Adrian and Zotterman 1926). The conception

that stimulus or behavior is related to ﬁring rate has become ubiquitous in neurophysiology.

It is often called rate coding, in contrast to temporal coding, which involves the information

carried in the precise timing of spikes (Abeles 1982; Shadlen and Movshon 1999; Singer

1999).

Rate coding:

stimulus or behavior

changes ﬁring rate

Temporal coding:

stimulus or behavior

changes precise

timing of spikes

Following these fundamental descriptions, remaining puzzles about the details of action

potential generation led to investigations by several neurophysiologists and, ultimately, to

one of the great scientiﬁc triumphs, the Hodgkin-Huxley model. Published in 1952 (Hodgkin

and Huxley 1952), the model consisted of a diﬀerential equation for the neural membrane

potential (in the squid giant axon) together with three subsidiary diﬀerential equations for

the dynamic properties of the sodium and potassium ion channels. See Figure 4. This work

produced accurate predictions of the time courses of membrane conductances; the form of

the action potential; the change in action potential form with varying concentrations of

sodium; the number of sodium ions involved in inward ﬂux across the membrane; the speed

of action potential propagation; and the voltage curves for sodium and potassium ions (Hille

2001; Hodgkin and Huxley 1952). Thus, by the time the brain-as-computer metaphor had

been established, the power of biophysical modeling had also been demonstrated. Over the

past 60 years, the Hodgkin-Huxley equations have been reﬁned, but the model’s fundamen-

tal formulation has endured, and serves as the basis for many present-day models of single

neuron activity; see Section 2.2.

6 Kass et al.

P

x1

x2

y

1

1

>1

AND

P

x1

x2

1

x3

y

w0

w1

w2

w3

> c

Perceptron

1

Figure 2

In the left diagram, McCulloch-Pitts neurons x1and x2each send binary activity to neuron y

using the rule y= 1 if x1+x2>1 and y= 0 otherwise; this corresponds to the logical AND

operator; other logical operators NOT, OR, NOR may be similarly implemented by thresholding.

In the right diagram, the general form of output is based on thresholding linear combinations, i.e.,

y= 1 when Pwixi> c and y= 0 otherwise. The values wiare called synaptic weights. However,

because networks of perceptrons (and their more modern artiﬁcial neural network descendents)

are far simpler than networks in the brain, each artiﬁcial neuron corresponds conceptually not to

an individual neuron in the brain but, instead, to large collections of neurons in the brain.

1.3. Receptive ﬁelds and tuning curves

In early recordings from the optic nerve of the Limulus (horseshoe crab), Hartline found

that shining a light on the eye could drive individual neurons to ﬁre, and that a neuron’s

ﬁring rate increased with the intensity of the light (Hartline and Graham 1932). He called

the location of the light that drove the neuron to ﬁre the neuron’s receptive ﬁeld. In primary

visual cortex (known as area V1), the ﬁrst part of cortex to get input from the retina, Hubel

and Wiesel showed that bars of light moving across a particular part of the visual ﬁeld, again

labeled the receptive ﬁeld, could drive a particular neuron to ﬁre and, furthermore, that

the orientation of the bar of light was important: many neurons were driven to ﬁre most

rapidly when the bar of light moved in one direction, and ﬁred much more slowly when

the orientation was rotated 90 degrees away (Hubel and Wiesel 1959). When ﬁring rate

is considered as a function of orientation, this function has come to be known as a tuning

curve (Dayan and Abbott 2001). More recently, the terms “receptive ﬁeld” and “tuning

curve” have been generalized to refer to non-spatial features that drive neurons to ﬁre. The

notion of tuning curves, which could involve many dimensions of tuning simultaneously, is

widely applied in computational neuroscience.

Tuning curve: the

trial-averaged ﬁring

rate of a neuron

considered as a

function of one or

more variables

1.4. Networks

Neuron-like artiﬁcial neural networks, advancing beyond perceptron networks, were devel-

oped during the 1960s and 1970s, especially in work on associative memory (Amari 1977b),

where a memory is stored as a pattern of activity that can be recreated by a stimulus when

www.annualreviews.org •Computational Neuroscience 7

Figure 3

(a) The LIF model is motivated by an equivalent circuit. The capacitor represents the cell

membrane through which ions cannot pass. The resistor represents channels in the membrane

(through which ions can pass) and the battery a diﬀerence in ion concentration across the

membrane. (b) The equivalent circuit motivates the diﬀerential equation that describes voltage

dynamics (gray box). When the voltage reaches a threshold value (Vthreshold), it is reset to a

smaller value (Vreset). In this model, the occurrence of a reset indicates an action potential; the

rapid voltage dynamics of action potentials are not included in the model. (c) An example trace of

the LIF model voltage (blue). When the input current (I) is large enough, the voltage increases

until reaching the voltage threshold (red horizontal line), at which time the voltage is set to the

reset voltage (green horizontal line). The times of reset are labeled as “AP”, denoting action

potential. In the absence of an applied current (I= 0) the voltage approaches a stable equilibrium

value (Vrest).

it provides even a partial match to the pattern. To describe a given activation pattern,

Hopﬁeld applied statistical physics tools to introduce an energy function and showed that a

simple update rule would decrease the energy so that the network would settle to a pattern-

matching “attractor” state (Hopﬁeld 1982). Hopﬁeld’s network model is an example of

what statisticians call a two-way interaction model for Nbinary variables, where the en-

ergy function becomes the negative log-likelihood function. Hinton and Sejnowski provided

a stochastic mechanism for optimization and the interpretation that a posterior distribution

was being maximized, calling their method a Boltzmann machine because the probabilities

they used were those of the Boltzmann distribution in statistical mechanics (Hinton and

Sejnowski 1983). Geman and Geman then provided a rigorous analysis together with their

reformulation in terms of the Gibbs sampler (Geman and Geman 1984). Additional tools

from statistical mechanics were used to calculate memory capacity and other properties of

memory retrieval (Amit et al. 1987), which created further interest in these models among

physicists.

Artiﬁcial neural networks gained traction as models of human cognition through a series

of developments in the 1980s (Medler 1998), producing the paradigm of parallel distributed

8 Kass et al.

Figure 4

The Hodgkin-Huxley model provides a mathematical description of a neuron’s voltage dynamics

in terms of changes in sodium (Na+) and potassium (K+) ion concentrations. The cartoon in (a)

illustrates a cell body with membrane channels through which (Na+) and (K+) may pass. The

model consists of four coupled nonlinear diﬀerential equations (b) that describe the voltage

dynamics (V), which vary according to an input current (I), a potassium current, a sodium

current, and a leak current. The conductances of the potassium (n) and sodium currents (m,h)

vary in time, which controls the ﬂow of sodium and potassium ions through the neural membrane.

Each channel’s dynamics depends on (c) a steady state function and a time constant. The steady

state functions range from 0 to 1, where 0 indicates that the channel is closed (so that ions cannot

pass), and 1 indicates that the channel is open (ions can pass). One might visualize these channels

as gates that swing open and closed, allowing ions to pass or impeding their ﬂow; these gates are

indicated in green and red in the cartoon (a). The steady state functions depend on the voltage;

the vertical dashed line indicates the typical resting voltage value of a neuron. The time constants

are less than 10 ms, and smallest for one component of the sodium channel (the sodium activation

gate m). (d) During an action potential, the voltage undergoes a rapid depolarization (V

increases) and then less rapid hyperpolarization (Vdecreases), supported by the opening and

closing of the membrane channels.

processing (PDP). PDP models are multi-layered networks of nodes resembling those of

their perceptron precursor, but they are interactive, or recurrent, in the sense that they are

not necessarily feed-forward: connections between nodes can go in both directions, and they

may have structured inhibition and excitation (Rumelhart et al. 1986). In addition, training

www.annualreviews.org •Computational Neuroscience 9

(i.e., estimating parameters by minimizing an optimization criterion such as the sum of

squared errors across many training examples) is done by a form of gradient descent known

as back propagation (because iterations involve steps backward from output errors toward

input weights). While the nodes within these networks do not correspond to individual

neurons, features of the networks, including back propagation, are usually considered to

be biologically plausible. For example, synaptic connections between biological neurons

are plastic, and change their strength following rules consistent with theoretical models

(e.g., Hebb’s rule). Furthermore, PDP models can reproduce many behavioral phenomena,

famously including generation of past tense for English verbs and making childlike errors

before settling on correct forms (McClelland and Rumelhart 1981). Currently, there is

increased interest in neural network models through deep learning, which we will discuss

brieﬂy, below.

Analysis of the overall structure of network connectivity, exempliﬁed in research on

social networks (see Fienberg (2012) for historical overview), has received much attention

following the 1998 observation that several very diﬀerent kinds of networks, including the

neural connectivity in the worm C. elegans, exhibit “small world” properties of short average

path length between nodes, together with substantial clustering of nodes, and that these

properties may be described by a relatively simple stochastic model (Watts and Strogatz

1998). This style of network description has since been applied in many contexts involving

brain measurement, mainly using structural and functional magnetic resonance imaging

(MRI) (Bassett and Bullmore 2016; Bullmore and Sporns 2009), though cautions have been

issued regarding the diﬃculty of interpreting results physiologically (Papo et al. 2016).

1.5. Statistical models

Stochastic considerations have been part of neuroscience since the ﬁrst descriptions of neural

activity, outlined brieﬂy above, due to the statistical mechanics underlying the ﬂow of

ions across channels and synapses (Colquhoun and Sakmann 1998; Destexhe et al. 1994).

Spontaneous ﬂuctuations in a neuron’s membrane potential are believed to arise from the

random opening and closing of ion channels, and this spontaneous variability has been

analyzed using a variety of statistical methods (Sigworth 1980). Such analysis provides

information about the numbers and properties of the ion channel populations responsible

for excitability. Probability has also been used extensively in psychological theories of

human behavior for more than 100 years, e.g., Stigler (1986, Ch. 7). Especially popular

theories used to account for behavior include Bayesian inference and reinforcement learning,

which we will touch on below. A more recent interest is to determine signatures of statistical

algorithms in neural function. For example, drifting diﬀusion to a threshold, which is used

with LIF models (Tuckwell 1988), has also been used to describe models of decision making

based on neural recordings (Gold and Shadlen 2007). However, these are all examples

of ways that statistical models have been used to describe neural activity, which is very

diﬀerent from the role of statistics in data analysis. Before previewing our treatment of

data analytic methods, we describe the types of data that are relevant to this article.

1.6. Recording modalities

Eﬀorts to understand the nervous system must consider both anatomy (its constituents

and their connectivity) and function (neural activity and its relationship to the apparent

goals of an organism). Anatomy does not determine function, but does strongly constrain

10 Kass et al.

it. Anatomical methods range from a variety of microscopic methods to static, whole-

brain MRI (Fischl et al. 2002). Functional investigations range across spatial and temporal

scales, beginning with recordings from ion channels, to action potentials, to local ﬁeld

potentials (LFPs) due to the activity of many thousands of neural synapses. Functional

measurements outside the brain (still reﬂecting electrical activity within it), come from

electroencephalography (EEG) (Nunez and Srinivasan 2006) and magnetoencephalography

(MEG) (H¨am¨al¨ainen et al. 1993), as well as indirect methods that measure a physiological

or metabolic parameter closely associated with neural activity, including positron emission

tomography (PET) (Bailey et al. 2005), functional MRI (fMRI) (Lazar 2008), and near-

infrared resonance spectroscopy (NIRS) (Villringer et al. 1993). These functional methods

have timescales spanning milliseconds to minutes, and spatial scales ranging from a few

cubic millimeters to many cubic centimeters.

While interesting mathematical and statistical problems arise in nearly every kind of

neuroscience data, we focus here on neural spiking activity. Spike trains are sometimes

recorded from individual neurons in tissue that has been extracted from an animal and

maintained over hours in a functioning condition (in vitro). In this setting, the voltage

drop across the membrane is nearly deterministic; then, when the neuron is driven with the

same current input on each of many repeated trials, the timing of spikes is often replicated

precisely across the trials (Mainen and Sejnowski 1995), as seen in portions of the spike

trains in Figure 5. Recordings from brains of living animals (in vivo) show substantial

irregularity in spike timing, as in Figure 1. These recordings often come from electrodes

that have been inserted into brain tissue near, but not on or in, the neuron generating a

resulting spike train; that is, they are extracellular recordings. The data could come from

one up to dozens, hundreds, or even thousands of electrodes. Because the voltage on each

electrode is due to activity of many nearby neurons, with each neuron contributing its own

voltage signature repeatedly, there is an interesting statistical clustering problem known as

spike sorting (Carlson et al. 2014; Rey et al. 2015), but we will ignore that here. Another

important source of activity, recorded from many individual neurons simultaneously, is

calcium imaging, in which light is emitted by ﬂuorescent indicators in response to the ﬂow

of calcium ions into neurons when they ﬁre (Grienberger and Konnerth 2012). Calcium

dynamics, and the nature of the indicator, limit temporal resolution to between tens and

several hundred milliseconds. Signals can be collected using one-photon microscopy even

from deep in the brain of a behaving animal; two-photon microscopy provides signiﬁcantly

higher spatial resolution but at the cost of limiting recordings to the brain surface. Due to

the temporal smoothing, extraction of spiking data from calcium imaging poses its own set

of statistical challenges (Pnevmatikakis et al. 2016).

Neural ﬁring rates vary widely, depending on recording site and physiological circum-

stances, from quiescent (essentially 0 spikes per second) to as many as 200 spikes per second.

The output of spike sorting is a sequence of spike times, typically at time resolution of 1

millisecond (the approximate width of an AP). While many analyses are based on spike

counts across relatively long time intervals (numbers of spikes that occur in time bins of

tens or hundreds of milliseconds), some are based on the more complete precise timing

information provided by the spike trains.

In some special cases, mainly in networks recorded in vitro, neurons are densely sampled

and it is possible to study the way activity of one neuron directly inﬂuences the activity of

other neurons (Pillow et al. 2008). However, in most experimental settings to date, a very

small proportion of the neurons in the circuit are sampled.

www.annualreviews.org •Computational Neuroscience 11

1.7. Data analysis

In experiments involving behaving animals, each experimental condition is typically re-

peated across many trials. On any two trials, there will be at least slight diﬀerences in

behavior, neural activity throughout the brain, and contributions from molecular noise, all

of which results in considerable variability of spike timing. Thus, a spike train may be

regarded as a point process, i.e., a stochastic sequence of event times, with the events being

spikes. We discuss point process modeling below, but note here that the data are typically

recorded as sparse binary time series in 1 millisecond time bins (1 if spike, 0 if no spike).

When spike counts within broader time bins are considered, they may be assumed to form

continuous-valued time series, and this is the framework for some of the methods refer-

enced below. It is also possible to apply time series methods directly to the binary data,

or smoothed versions of them, but see the caution in Kass et al. (2014, Section 19.3.7).

A common aim is to relate an observed pattern of activity to features of the experimental

stimulus or behavior. However, in some settings predictive approaches are used, often under

the rubric of decoding, in the sense that neural activity is “decoded” to predict the stimulus

or behavior. In this case, tools associated with the ﬁeld of statistical machine learning may

be especially useful (Ventura and Todorova 2015). We omit many interesting questions

that arise in the course of analyzing biological neural networks, such as the distribution of

the post-synaptic potentials that represent synaptic weights (Buzs´aki and Mizuseki 2014;

Teramae et al. 2012).

Data analysis is performed by scientists with diverse backgrounds. Statistical ap-

proaches use frameworks built on probabilistic descriptions of variability, both for inductive

reasoning and for analysis of procedures. The resulting foundation for data analysis has

been called the statistical paradigm (Kass et al. 2014, Section 1.2).

1.8. Components of the nervous system

When we speak of neurons, or brains, we are indulging in sweeping generalities: properties

may depend not only on what is happening to the organism during a study, but also on

the component of the nervous system studied, and the type of animal being used. Popular

organisms in neuroscience include worms, mollusks, insects, ﬁsh, birds, rodents, non-human

primates, and, of course, humans. The nervous system of vertebrates comprises the brain,

the spinal cord, and the peripheral system. The brain itself includes both the cerebral cortex

and sub-cortical areas. Textbooks of neuroscience use varying organizational rubrics, but

major topics include the molecular physiology of neurons, sensory systems, the motor sys-

tem, and systems that support higher-order functions associated with complex and ﬂexible

behavior (Kandel et al. 2013; Swanson 2012). Attempts at understanding computational

properties of the nervous system have often focused on sensory systems: they are more eas-

ily accessed experimentally, controlled inputs to them can be based on naturally occurring

inputs, and their response properties are comparatively simple. In addition, much attention

has been given to the cerebral cortex, which is involved in higher-order functioning.

2. Single Neurons

Mathematical models typically aim to describe the way a given phenomenon arises from

some architectural constraints. Statistical models typically are used to describe what a

particular data set can say concerning the phenomenon, including the strength of evidence.

12 Kass et al.

We very brieﬂy outline these approaches in the case of single neurons, and then review

attempts to bring them together.

2.1. LIF models and their extensions

Originally proposed more than a century ago, the LIF model (Figure 3) continues to serve

an important role in neuroscience research (Abbott 1999). Although LIF neurons are de-

terministic, they often mimic the variation in spike trains of real neurons recorded in vitro,

such as those in Figure 5. In the left panel of that ﬁgure, the same ﬂuctuating current is

applied repeatedly as input to the neuron, and this creates many instances of spike times

that are highly precise in the sense of being replicated across trials; some other spike times

are less precise. Precise spike times occur when a large slope in the input current leads

to wide recruitment of ion channels (Mainen and Sejnowski 1995). Temporal locking of

spikes to high frequency inputs also can be seen in LIF models (Goedeke and Diesmann

2008). Many extensions of the original leaky integrate-and-ﬁre model have been developed

to capture other features of observed neuronal activity (Gerstner et al. 2014), including

more realistic spike initiation through inclusion of a quadratic term, and incorporation of a

second dynamical variable to simulate adaptation and to capture more diverse patterns of

neuronal spiking and bursting. Even though these models ignore the biophysics of action

potential generation (which involve the conductances generated by ion channels, as in the

Hodgkin-Huxley model), they are able to capture the nonlinearities present in several bio-

physical neuronal models (Rotstein 2015). The impact of stochastic eﬀects due to the large

number of synaptic inputs delivered to an LIF neuron has also been extensively studied

using diﬀusion processes (Lansky and Ditlevsen 2008).

2.2. Biophysical models

There are many extensions of the Hodgkin and Huxley framework outlined in Figure 4.

These include models that capture additional biological features, such as additional ionic

currents (Somjen 2004), and aspects of the neuron’s extracellular environment (Wei et al.

2014), both of which introduce new fast and slow timescales to the dynamics. Contributions

due to the extensive dendrites (which receive inputs to the neuron) have been simulated

in detailed biophysical models (Rall 1962). While increased biological realism necessitates

additional mathematical complexity, especially when large populations of neurons are con-

sidered, the Hodgkin-Huxley model and its extensions remain fundamental to computational

neuroscience research (Markram et al. 2015; Traub et al. 2005).

Simpliﬁed mathematical models of single neuron activity have facilitated a dynamical

understanding of neural behavior. The Fitzhugh-Nagumo model is a purely phenomenolog-

ical model, based on geometric and dynamic principles, and not directly on the neuron’s

biophysics (Fitzhugh 1960; Nagumo et al. 1962). Because of its low dimensionality, it is

amenable to phase-plane analysis using dynamical systems tools (e.g., examining the null-

clines, equilibria and trajectories).

An alternative approach is to simplify the equations of a detailed neuronal model in

ways that retain a biophysical interpretation (Ermentrout and Terman 2010). For example,

by making a steady-state approximation for the fast ionic sodium current activation in the

Hodgkin-Huxley model (min Figure 4), and recasting two of the gating variables (nand h),

it is possible to simplify the original Hodgkin-Huxley model to a two-dimensional model,

which can be investigated more easily in the phase plane (Gerstner et al. 2014). The de-

www.annualreviews.org •Computational Neuroscience 13

velopment of simpliﬁed models is closely interwoven with bifurcation theory and the theory

of normal forms within dynamical systems (Izhikevich 2007). One well-studied reduction

of the Hodgkin-Huxley equations to a 2-dimensional conductance-based model was devel-

oped by John Rinzel (Rinzel 1985). In this case, the geometries of the phenomenological

Fitzhugh-Nagumo model and the simpliﬁed Rinzel model are qualitatively similar. Yet an-

other approach to dimensionality reduction consists of neglecting the spiking currents (fast

sodium and delayed-rectifying potassium) and considering only the currents that are active

in the sub-threshold regime (Rotstein et al. 2006). This cannot be done in the original

Hodgkin-Huxley model, because the only ionic currents are those that lead to spikes, but

it is useful in models that include additional ionic currents in the sub-threshold regime.

2.3. Point process regression models of single neuron activity

Mathematically, the simplest model for an irregular spike train is a homogeneous Poisson

process, for which the probability of spiking within a time interval (t, t + ∆t], for small ∆t,

may be written

P(spike in (t, t + ∆t]) ≈λ∆t,

where λrepresents the ﬁring rate of the neuron and where disjoint intervals have indepen-

dent spiking. This model, however, is often inadequate for many reasons. For one thing,

neurons have noticeable refractory periods following a spike, during which the probability of

spiking goes to zero (the absolute refractory period) and then gradually increases, often over

tens of milliseconds (the relative refractory period). In this sense neurons exhibit memory

eﬀects, often called spike history eﬀects. To capture those, and many other physiological

eﬀects, more general point processes must be used. We outline the key ideas underlying

point process modeling of spike trains.

Absolute refractory

period: after a

neuron ﬁres the

sodium channels are

unable to open for

approximately 1

millisecond

Relative refractory

period: after the

hard refractory

period a neuron’s

probability of ﬁring

gradually increases

from zero

As we indicated in Section 1.2, a fundamental result in neurophysiology is that neurons

respond to a stimulus or contribute to an action by increasing their ﬁring rates. The

measured ﬁring rate of a neuron within a time interval would be the number of spikes in the

interval divided by the length of the interval (usually in units of seconds, so that the ratio

is in spikes per second, abbreviated as Hz, for Hertz). The point process framework centers

on the theoretical instantaneous ﬁring rate, which takes the expected value of this ratio

and passes to the limit as the length of the time interval goes to zero, giving an intensity

function for the process. To accurately model a neuron’s spiking behavior, however, the

intensity function typically must itself evolve over time depending on changing inputs and

experimental conditions, the recent past spiking behavior of the neuron, the behavior of

other neurons, the behavior of local ﬁeld potentials, etc. It is therefore called a conditional

intensity function and may be written in the form

λ(t|xt) = lim

∆t→0

E(N(t,t+∆t]|Xt=xt)

∆t

where N(t,t+∆t]is the number of spikes in the interval (t, t + ∆t] and where the vector Xt

includes both the past spiking history Htprior to time tand also any other quantities that

aﬀect the neuron’s current spiking behavior. In some special cases, the conditional intensity

will be deterministic, but in general, because Xtis random, the conditional intensity is also

random. If Xtincludes unobserved random variables, the process is often called doubly

stochastic. When the conditional intensity depends on the history Ht, the process is often

called self-exciting (though the eﬀects may produce an inhibition of ﬁring rate rather than

14 Kass et al.

an excitation). The vector Xtmay be high-dimensional. A mathematically tractable special

case, where contributions to the intensity due to previous spikes enter additively in terms

of a ﬁxed kernel function, is the Hawkes process.

As a matter of interpretation, in suﬃciently small time intervals the spike count is either

zero or one, so we may replace the expectation with the probability of spiking and get

P(spike in (t, t + ∆t]|Xt=xt)≈λ(t|xt)∆t.

A statistical model for a spike train involves two things: (1) a simple, universal formula

for the probability density of the spike train in terms of the conditional intensity function

(which we omit here) and (2) a speciﬁcation of the way the conditional intensity function

depends on variables xt. An analogous statement is also true for multiple spike trains,

possibly involving multiple neurons. Thus, when the data are resolved down to individual

spikes, statistical analysis is primarily concerned with modeling the conditional intensity

function in a form that can be implemented eﬃciently and that ﬁts the data adequately

well. That is, writing

λ(t|xt) = f(xt),(1)

the challenge is to identify within the variable xtall relevant eﬀects, or features, in the

terminology of machine learning, and then to ﬁnd a suitable form for the function f, keeping

in mind that, in practice, the dimension of xtmay range from 1 to many millions. This

identiﬁcation of the components of xtthat modulate the neuron’s ﬁring rate is a key step

in interpreting the function of a neural system. Details may be found in Kass et al. (2014,

Chapter 19), but see Amarasingham et al. (2015) for an important caution about the

interpretation of neural ﬁring rate through its representation as a point process intensity

function.

A statistically tractable non-Poisson form involves log-additive models, the simplest

case being

log λ(t|xt) = log λ(t|Ht) = log g0(t) + log g1(t−s∗(t)) (2)

where s∗(t) is the time of the immediately preceding spike, and g0and g1are functions that

may be written in terms of some basis (Kass and Ventura 2001). To include contributions

from spikes that are earlier than the immediately preceding one, the term log g1(t−s∗(t))

is replaced by a sum of terms of the form log g1j(t−sj(t)), where sj(t) is the j-th spike

back in time preceding t, and a common simpliﬁcation is to assume the functions g1jare all

equal to a single function g1(Pillow et al. 2008). The resulting probability density function

for the set of spike times (which deﬁnes the likelihood function) is very similar to that of

a Poisson generalized linear model (GLM) and, in fact, GLM software may be used to ﬁt

many point process models (Kass et al. 2014, Chapter 19). The use of the word “linear” may

be misleading here because highly nonlinear functions may be involved, e.g., in Equation

(2), g0and g1are typically nonlinear. An alternative is to call these point process regression

models. Nonetheless, the model in (2) is often said to specify a GLM neuron, as are other

point process regression models.

2.4. Point process regression and leaky integrate-and-ﬁre models

Assuming excitatory and inhibitory Poisson process inputs to an LIF neuron, the distribu-

tion of waiting times for a threshold crossing, which corresponds to the inter-spike interval

(ISI), is found to be inverse Gaussian (Tuckwell 1988) and this distribution often provides a

www.annualreviews.org •Computational Neuroscience 15

good ﬁt to experimental data when neurons are in steady state, as when they are isolated in

vitro and spontaneous activity is examined (Gerstein and Mandelbrot 1964). The inverse

Gaussian distribution, within a biologically-reasonable range of coeﬃcient of variations,

turns out to be qualitatively very similar to ISI distributions generated by processes given

by Equation (2). Furthermore, spike trains generated from LIF models can be ﬁtted well

by these GLM-type models (Kass et al. 2014, Section 19.3.4 and references therein).

An additional connection between LIF and GLM neurons comes from considering the

response of neurons to injected currents, as illustrated in Figure 5. In this context, the ﬁrst

term in Equation (2) may be rewritten as a convolution with the current I(t) at time t, so

that (2) becomes

log λ(t|xt) = log λ(t|Ht, It) = Z∞

0

g0(s)I(t−s)ds + log g1(t−s∗(t)).(3)

Figure 5 shows the estimate of g0that results from ﬁtting this model to data illustrated

in that ﬁgure. Here, the function g0is often called a stimulus ﬁlter. On the other hand,

following Gerstner et al. (2014, Chapter 6), we may write a generalized version of LIF in

integral form,

V(t) = Vrest +Z∞

0

g0(s)I(t−s)ds + log g1(t−s∗(t)) (4)

which those authors call a Spike Response Model (SRM). By equating the log conditional

intensity to voltage in (4),

log λ(t|Ht, It) = V(t)−Vrest

we thereby get a modiﬁed LIF neuron that is also a GLM neuron (Paninski et al. 2009).

Thus, both theory and empirical study indicate that GLM and LIF neurons are very similar,

and both describe a variety of neural spiking patterns (Weber and Pillow 2016).

It is interesting that these empirically-oriented SRMs, and variants that included an

adaptive threshold (Kobayashi et al. 2009), performed better than much more complicated

biophysical models in a series of international competitions for reproducing and predicting

recorded spike times of biological neurons under varying circumstances (Gerstner and Naud

2009).

2.5. Multidimensional models

The one-dimensional LIF dynamic model in Figure 3b is inadequate when interactions of

sub-threshold ion channel dynamics cause a neuron’s behavior to be more complicated than

integration of inputs. Neurons can even behave as diﬀerentiators and respond only to

ﬂuctuations in input. Furthermore, as noted in Sections 1.3 and 2.3, features that drive

neural ﬁring can be multidimensional. Multivariate dynamical systems are able to describe

the ways that interacting, multivariate eﬀects can bring the system to its ﬁring threshold,

as in the Hodgkin-Huxley model (Hong et al. 2007). A number of model variants that aim

to account for such multidimensional eﬀects have been compared in predicting experimental

data from sensory areas (Aljadeﬀ et al. 2016).

2.6. Statistical challenges in biophysical modeling

Conductance-based biophysical models pose problems of model identiﬁability and parame-

ter estimation. The original Hodgkin-Huxley equations (Hodgkin and Huxley 1952) contain

16 Kass et al.

on the order of two dozen numerical parameters describing the membrane capacitance, max-

imal conductances for the sodium and potassium ions, kinetics of ion channel activation and

inactivation, and the ionic equilibrium potentials (at which the ﬂow of ions due to imbal-

ances of concentration across the cell membrane oﬀsets that due to imbalances of electrical

charge). Hodgkin and Huxley arrived at estimates of these parameters through a combina-

tion of extensive experimentation, biophysical reasoning, and regression techniques. Others

have investigated the experimental information necessary to identify the model (Walch and

Eisenberg 2016). In early work, statistical analysis of nonstationary ensemble ﬂuctuations

was used to estimate the conductances of individual ion channels (Sigworth 1977). Fol-

lowing the introduction of single-channel recording techniques (Sakmann and Neher 1984),

which typically report a binary projection of a multistate underlying Markovian ion channel

process, many researchers expanded the theory of aggregated Markov processes to handle

inference problems related to identifying the structure of the underlying Markov process

and estimating transition rate parameters (Qin et al. 1997).

More recently, parameter estimation challenges in biophysical models have been tackled

using a variety of techniques under the rubric of “data assimilation,” where data results are

combined with models algorithmically. Data assimilation methods illustrate the interplay

of mathematical and statistical approaches in neuroscience. For example, in Meng et al.

(2014), the authors describe a state space modeling framework and a sequential Monte

Carlo (particle ﬁlter) algorithm to estimate the parameters of a membrane current in the

Hodgkin-Huxley model neuron. They applied this framework to spiking data recorded from

rat layer V cortical neurons, and correctly identiﬁed the dynamics of a slow membrane

current. Variations on this theme include the use of synchronization manifolds for parameter

estimation in experimental neural systems driven by dynamically rich inputs (Meliza et al.

2014), combined statistical and geometric methods (Tien and Guckenheimer 2008), and

other state space models (Vavoulis et al. 2012).

3. Networks

3.1. Mechanistic approaches for modeling small networks

While biological neural networks typically involve anywhere from dozens to many millions of

neurons, studies of small neural networks involving handfuls of cells have led to remarkably

rich insights. We describe three such cases here, and the types of mechanistic models that

drive them.

First, neural networks can produce rhythmic patterns of activity. Such rhythms, or os-

cillations, play clear roles in central pattern generators (CPGs) in which cell groups produce

coordinated ﬁring for, e.g., locomotion or breathing (Grillner and Jessell 2009; Marder and

Bucher 2001). Small network models have been remarkably successful in describing how

such rhythms occur. For example, models involving pairs of cells have revealed how delays

in connections among inhibitory cells, or reciprocal interactions between excitatory and in-

hibitory neurons, can lead to rhythms in the gamma range (30-80 Hz) associated with some

aspects of cognitive processing. A general theory, beginning with two-cell models of this

type, describes how synaptic and intrinsic cellular dynamics interact to determine when the

underlying synchrony will and will not occur (Kopell and Ermentrout 2002). Larger models

involving three or more interacting cell types describe the origin of more complex rhythms,

such as the triphasic rhythm in the stomatogastric ganglion (for digestion in certain in-

vertebrates). This system in particular has revealed a rich interplay between the intrinsic

www.annualreviews.org •Computational Neuroscience 17

Figure 5

Left panel displays the current (“Stim,” for stimulus, at the top of the panel) injected into a

mitral cell from the olfactory system of a mouse, together with the neural spiking response (MC)

across many trials (each row displays the spike train for a particular trial). The response is highly

regular across trials, but at some points in time it is somewhat variable. The right panel displays

a stimulus ﬁlter ﬁtted to the complete set of data using model (3), where the stimulus ﬁlter, i.e.,

the function g0(s), represents the contribution to the ﬁring rate due to the current I(t−s) at s

milliseconds prior to time t. Figure modiﬁed from (Wang et al. 2015)

.

dynamics in multiple cells and the synapses that connect them (Marder and Bucher 2001).

There turn out to be many highly distinct parameter combinations, lying in subsets of

parameter space, that all produce the key target rhythm, but do so in very diﬀerent ways

(Prinz et al. 2004). Understanding the origin of this ﬂexibility, and how biological systems

take advantage of it to produce robust function, is a topic of ongoing work.

The underlying mechanistic models for rhythmic phenomena are of Hodgkin-Huxley

type, involving sodium and potassium channels (Figure 4). For some phenomena, including

respiratory and stomatogastric rhythms, additional ion channels that drive bursting in single

cells play a key role. Dynamical systems tools for assessing the stability of periodic orbits

may then be used to determine what patterns of rhythmic activity will be stably produced

by a given network. Speciﬁcally, coupled systems of biophysical diﬀerential equations can

often be reduced to interacting circular variables representing the phase of each neuron

(Ermentrout and Terman 2010). Such phase models yield to very elegant stability analyses

that can often predict the dynamics of the original biophysical equations.

A second example concerns the origin of collective activity in irregularly spiking neural

circuits. To understand the development of correlated spiking in such systems, stochas-

tic diﬀerential equation models, or models driven by point process inputs, are typically

used. This yields Fokker-Planck or population density equations (Tranchina 2010; Tuck-

well 1988) and these can be iterated across multiple layers or neural populations (Doiron

et al. 2006; Tranchina 2010). In many cases, such models can be approximated using lin-

18 Kass et al.

ear response approaches, yielding analytical solutions and considerable mechanistic insight

(De La Rocha et al. 2007; Ostojic and Brunel 2011a). A prominent example comes from the

mechanisms of correlated ﬁring in feedforward networks (De La Rocha et al. 2007; Shadlen

and Newsome 1998). Here, stochastically ﬁring cells send diverging inputs to multiple neu-

rons downstream. The downstream neurons thereby share some of their input ﬂuctuations,

and this, in turn, creates correlated activity that can have rich implications for information

transmission (De La Rocha et al. 2007; Doiron et al. 2016; Zylberberg et al. 2016).

A third case of highly inﬂuential small circuit modeling concerns neurons in the early

visual cortex (early in the sense of being only a few synapses from the retina), which are

responsive to visual stimuli (moving bars of light) with speciﬁc orientations that fall within

their receptive ﬁeld (see Section 1.3). Neurons having neighboring regions within their re-

ceptive ﬁeld in which a stimulus excites or inhibits activity were called simple cells, and

those without this kind of sub-division were complex cells. Hubel and Wiesel famously

showed how simple circuit models can account for both the simple and complex cell re-

sponses (Hubel and Wiesel 1959). Later work described this through one or several iterated

algebraic equations that map input ﬁring rates xiinto outputs y=f(Piwixi), where

w= (w1,...,wN) is a synaptic weight vector.

3.2. Statistical methods for small networks

Point process models for small networks begin with conditional intensity speciﬁcations sim-

ilar to that in Equation (2), and include coupling terms (Kass et al. 2014, Section 19.3.4,

and references therein). They have been applied to CPGs described above, in Section 3.1,

to reconstruct known circuitry from spiking data (Gerhard et al. 2013). In addition, many

of the methods we discuss below, in Section 3.4 on large networks, have also been used with

small networks.

3.3. Mechanistic models of large networks across scales and levels of complexity

There is a tremendous variety of mechanistic models of large neural networks. We here

describe these in rough order of their complexity and scale.

3.3.1. Binary and ﬁring rate models. At the simplest level, binary models abstract the ac-

tivity of each neuron as either active (taking the value 1) or silent (0) in a given time step.

As mentioned in the Introduction, despite their simplicity, these models capture funda-

mental properties of network activity (Renart et al. 2010; van Vreeswijk and Sompolinsky

1996) and explain network functions such as associative memory. The proportion of active

neurons at a given time is governed by eﬀective rate equations (Ginzburg and Sompolin-

sky 1994; Wilson and Cowan 1972). Such ﬁring rate models feature a continuous range of

activity states, and often take the form of nonlinear ordinary or stochastic diﬀerential equa-

tions. Like binary models, these also implement associative memory (Hopﬁeld 1984), but

are widely used to describe broader dynamical phenomena in networks, including predic-

tions of oscillations in excitatory-inhibitory networks (Wilson and Cowan 1972), transitions

from ﬁxed point to oscillatory to chaotic dynamics in randomly connected neural networks

(Bos et al. 2016), ampliﬁed selectivity to stimuli, and the formation of line attractors (a

set of stable solutions on a line in state space) that gradually store and accumulate input

signals (Cain and Shea-Brown 2012).

Firing rate models have been a cornerstone of theoretical neuroscience. Their second

www.annualreviews.org •Computational Neuroscience 19

order statistics can analytically be matched to more realistic spiking and binary models

(Grytskyy et al. 2013; Ostojic and Brunel 2011a). We next describe how trial-varying

dynamical ﬂuctuations can emerge in networks of spiking neuron models.

3.3.2. Stochastic spiking activity in networks. A beautiful body of work summarizes the

network state in a population-density approach that describes the evolution of the probabil-

ity density of states rather than individual neurons (Amit and Brunel 1997). The theory is

able to capture refractoriness (Meyer and van Vreeswijk 2002) and adaptation (Deger et al.

2014). Furthermore, although it loses the identity of individual neurons, it can faithfully

capture collective activity states, such as oscillations (Brunel 2000). Small synaptic ampli-

tudes and weak correlations further reduce the time-evolution to a Fokker-Planck equation

(Brunel 2000; Ostojic et al. 2009). Network states beyond such diﬀusion approximations

include neuronal avalanches, the collective and nearly synchronous ﬁring of a large fraction

of cells, often following power-law distributions (Beggs and Plenz 2003). While early work

focused on the ﬁring rates of populations, later work clariﬁed how more subtle patterns

of correlated spiking develop. In particular, linear ﬂuctuations about a stationary state

determine population-averaged measures of correlations (Helias et al. 2013; Ostojic et al.

2009; Tetzlaﬀ et al. 2012; Trousdale et al. 2012).

At an even larger scale, a continuum of coupled population equations at each point

in space lead to neuronal ﬁeld equations (Bressloﬀ 2012). They predict stable “bumps” of

activity, as well as traveling waves and spirals (Amari 1977a; Roxin et al. 2006). Intriguingly,

when applied as a model of visual cortex and rearranged to reﬂect spatial layout of the

retina, patterns induced in these continuum equations can resemble visual hallucinations

(Bressloﬀ et al. 2001).

Analysis has provided insight into the ways that spiking networks can produce irreg-

ular spike times like those found in cortical recordings from behaving animals (Shadlen

and Newsome 1998), as in Figure 5. Suppose we have a network of NEexcitatory and

NIinhibitory LIF neurons with connections occurring at random according to independent

binary (Bernoulli) random variables, i.e., a connection exists when the binary random vari-

able takes the value 1 and does not exist when it is 0. We denote the binary connectivity

random variables by καβ

ij , where αand βtake the values Eor I, with καβ

ij = 1 when the

output of neuron jin population βinjects current into neuron iin population α. We let

Jαβ be the coupling strength (representing synaptic current) from a neuron in population

βto a neuron in population α. Thus, the contribution to the current input of a neuron in

population αgenerated at time tby a spike from neuron in population βat time swill be

Jαβ καβ

ij δ(t−s), where δ(t−s) is the Dirac delta function. The behavior of the network can

be analyzed by letting NE→ ∞ and NI→ ∞. Based on reasonable simplifying assump-

tions, the mean Mαand variance Vαof the total current for population αhave been derived

(Amit and Brunel 1997; Van Vreeswijk and Sompolinsky 1998), and these determine the

regularity or irregularity in spiking activity.

We step through three possibilities, under three diﬀerent conditions on the network,

using a modiﬁcation of the LIF equation found in Figure 3. The set of equations, for

all the neurons in the network, includes terms deﬁned by network connectivity and also

terms deﬁned by external input ﬂuctuations. Because the connectivity matrix may contain

cycles (there may be a path from any neuron back to itself), network connectivity is called

20 Kass et al.

recurrent. Let us take the membrane potential of neuron ifrom population αto be

ταdV α

i

dt =−Vα

i+µα

0+√τασα

0ξα

i(t)

| {z }

external inputs

+ταJαE

NE

X

j=1

καE

ij δ(t−tE

jk )

| {z }

recurrent excitation

−ταJαI

NI

X

j=1

καI

ij δ(t−tI

jk )

| {z }

recurrent inhibition(5)

where tα

ik is the kth spike time from neuron iof population α,ταis the membrane dynamics

time constant, and the external inputs include both a constant µ0and a ﬂuctuating source

σ0ξ(t) where ξ(t) is white noise (independent across neurons). This set of equations is

supplemented with the spike reset rule that when Vα

i(t) = VTthe voltage resets to VR< VT.

The ﬁring rate of the average neuron in population αis λα=PjPkδ(t−tα

jk )/N α.

For the network to remain stable, we take these ﬁring rates to be bounded, i.e., λα∼

O(1). Similarly, to assure that the current input to each neuron remains bounded, some

assumption must be made about the way coupling strengths Jαβ scale as the number of

inputs Kincreases. Let us take the scaling to be Jαβ =jαβ/Kγ, with jαβ ∼ O(1), as

K→ ∞, where γis a scaling exponent. We describe the resulting spiking behavior under

scaling conditions γ= 1 and γ= 1/2.

0.4 0.6 0.8 1 1.2 1.4 1.6

0

2

4

6

8

10

12

0 10 20 30 40 50 60 70

10

-3

10

-2

10

-1

Neuron Index

1000

1

Time (ms) Time (ms) Time (ms)

0 300150 0 300150 0 300150

10-3

10-2

10-1

0 20 40 60

Firing Rate (Hz)

12

6

00.4 0.8 1.2 1.6

ISI Coefficient of Variation

DensityDensity

weak coupling

stochastic

strong coupling

deterministic

Inhibition

Excitation

External noise Large coupling

A B

C

Figure 6

Panel A displays plots of spike trains from 1000 excitatory neurons in a network having 1000

excitatory and 1000 inhibitory LIF neurons with connections determined from independent

Bernoulli random variables having success probability of 0.2; on average K= 200 inputs per

neuron with no synaptic dynamics. Each neuron receives a static depolarizing input; in absence of

coupling each neuron ﬁres repetitively. Left: Spike trains under weak coupling, current J∝K−1.

Middle: Spike trains under weak couplng, with additional uncorrelated noise applied to each cell.

Right: Spike trains under strong coupling, J∝K−1

2. Panel B shows the distribution of ﬁring

rates across cells, and panel C the distribution of interspike interval (ISI) coeﬃcient of variation

across cells.

If we set γ= 1 then we have J∼1/K, so that J K =j∼ O(1). In this case we get

Mα∼ O(1) and Vα= [σα

0]2+O(1/√K). If we further set σα

0= 0, so that all ﬂuctuations

must be internal, then Vαvanishes for large K. In such networks, after an initial transient,

the neurons synchronize, and each ﬁres with perfect rhythmicity (left part of panel A in

Figure 6). This is very diﬀerent than the irregularity seen in cortical recordings (Figure 3).

Therefore, some modiﬁcation must be made.

www.annualreviews.org •Computational Neuroscience 21

The ﬁrst route to appropriate spike train irregularity keeps γ= 1 while setting [σα

0]2∼

O(1) so that Vαno longer vanishes in the large Klimit. Simulations of this network

(Figure 6A, middle) maintain realistic rates (Figure 6B, red curve), but also show realistic

irregularity (Faisal et al. 2008), as quantiﬁed in Figure 6C by the coeﬃcient of variation

(CV) of the inter-spike intervals. Treating irregular spiking activity as the consequence of

stochastic inputs has a long history (Tuckwell 1988).

The second route does not rely on external input stochasticity, but instead increases

the synaptic connection strengths by setting γ= 1/2. As a consequence we get Vα∼ O(1)

even if σα

0= 0 so that variability is internally generated through recurrent interactions

(Monteforte and Wolf 2012; Van Vreeswijk and Sompolinsky 1998), but to get Mα∼ O(1),

an additional condition is needed. If the recurrent connectivity is dominated by inhibition,

so that the network recurrence results in negative current, the activity dynamically settles

into a state in which

Mα=√Kµα+jαE ταλE−jαI ταλI

| {z }

O(1/√K): balance condition

∼ O(1),(6)

where µα

0has been replaced by the constant µαusing µα

0=√Kµαso that the mean external

input is of order O(√K). The scaling γ= 1/2 now makes the total excitatory and the total

inhibitory synaptic inputs individually large, i.e., O(√K), so that the Vαis also large.

However, given the balance condition in (6), excitation and inhibition mutually cancel and

Vαremains moderate. Simulations of the network with γ= 1/2 and σα

0= 0 shows an

asynchronous network dynamic (Figure 6A, right). Further, the ﬁring rates stabilize at low

mean levels (Figure 6B, blue curve), while the inter-spike interval CV is large (Figure 6C,

blue curve).

These two mechanistic routes to high levels of neural variability diﬀer strikingly in the

degree of heterogeneity of the spiking statistics. For the weak coupling with γ= 1 the

resulting distribution of ﬁring rates and inter-spike interval CVs are narrow (Figure 6B,C,

red curves). At strong coupling with γ= 1/2, however, the spread of ﬁring rates is large:

over half of the neurons ﬁre at rates below 1 Hz (Figure 6B, blue curve), in line with

observed cortical activity (Roxin et al. 2011). The approximate dynamic balance between

excitatory and inhibitory synaptic currents has been conﬁrmed experimentally (Okun and

Lampl 2008) and is usually called balanced excitation and inhibition.

3.3.3. Asynchronous dynamics in recurrent networks. The analysis above focused only on

Mαand Vα, ignoring any correlated activity between the currents neurons in the network.

The original justiﬁcation for such asynchronous dynamics in Van Vreeswijk and Sompolin-

sky (1998) and Amit and Brunel (1997) relied on a sparse wiring assumption, i.e, K/Nα→0

as Nα→ ∞ for α∈(E , I). However, more recently it has been shown that the balanced

mechanism required to keep ﬁring rates moderate also ensures that network correlations

vanish. Balance arises from the dominance of negative feedback which suppresses ﬂuc-

tuations in the population-averaged activity and hence causes small pairwise correlations

(Tetzlaﬀ et al. 2012). As a consequence, ﬂuctuations of excitatory and inhibitory synaptic

currents are tightly locked so that Equation (6) is satisﬁed. The excitatory and inhibitory

cancellation mechanism therefore extends to pairs of cells and operates even in networks

with dense wiring, i.e., K/Nα∼ O(1) (Hertz 2010; Renart et al. 2010), so that input cor-

relations are much weaker than expected by the number of shared inputs (Shadlen and

22 Kass et al.

Newsome 1998; Shea-Brown et al. 2008). This suppression and cancellation of correlations

holds in the same way for intrinsically-generated ﬂuctuations that often even dominate the

correlation structure (Helias et al. 2014). Recent work has shown that the asynchronous

state is more robustly realized in nonrandom networks than normally distributed random

networks (Litwin-Kumar and Doiron 2012; Teramae et al. 2012).

There is a large literature on how network connectivity, at the level of mechanistic mod-

els, leads to diﬀerent covariance structures in network activity (Ginzburg and Sompolinsky

1994). Highly local connectivity features scale up to determine global levels of covariance

(Doiron et al. 2016; Helias et al. 2013; Trousdale et al. 2012). Moreover, features of that

connectivity that point speciﬁcally to low-dimensional structures of neural covariability can

be isolated (Doiron et al. 2016). An outstanding problem is to create model networks that

mimic the low-dimensional covariance structure reported in experiments (see Section 3.4.1).

3.4. Statistical methods for large networks

New recording technologies should make it possible to track the ﬂow of information across

very large networks of neurons, but the details of how to do so have not yet been established.

One tractable component of the problem (Cohen and Kohn 2011) involves co-variation in

spiking activity among many neurons (typically dozens to hundreds), which leads naturally

to dimensionality reduction and to graphical representations (where neurons are nodes,

and some deﬁnition of correlated activity determines edges). However, two fundamental

complications aﬀect most experiments. First, co-variation can occur at multiple timescales.

A simpliﬁcation is to consider either spike counts in coarse time bins (20 milliseconds or

longer) or spike times with precision in the range of 1-5 milliseconds. We will discuss meth-

ods based on spike counts and precise spike timing separately, in the next two subsections.

Second, experiments almost always involve some stimuli or behaviors that create evolving

conditions within the network. Thus, methods that assume stationarity must be used with

care, and analyses that allow for dynamic evolution will likely be useful. Fortunately, many

experiments are conducted using multiple exposures to the same stimuli or behavioral cues,

which creates a series of putatively independent replications (trials). While the responses

across trials are variable, sometimes in systematic ways, the setting of multiple trials often

makes tractable the analysis of non-stationary processes.

After reviewing techniques for analyzing co-variation of spike counts and precisely-timed

spiking we will also brieﬂy mention three general approaches to understanding network

behavior: reinforcement learning, Bayesian inference, and deep learning. Reinforcement

learning and Bayesian inference use a decision-theoretic foundation to deﬁne optimal actions

of the neural system in achieving its goals, which is appealing insofar as evolution may drive

organism design toward optimality.

3.4.1. Correlation and dimensionality reduction in spike counts. Dimensionality reduction

methods have been fruitfully applied to study decision-making, learning, motor control,

olfaction, working memory, visual attention, audition, rule learning, speech, and other phe-

nomena (Cunningham and Yu 2014). Dimensionality reduction methods that have been

used to study neural population activity include principal component analysis, factor anal-

ysis, latent dynamical systems, and non-linear methods such as Isomap and locally-linear

embedding. Such methods can provide two types of insights. First, the time course of the

neural response can vary substantially from one experimental trial to the next, even though

www.annualreviews.org •Computational Neuroscience 23

the presented stimulus, or the behavior, is identical on each trial. In such settings, it is of

interest to examine population activity on individual trials (Churchland et al. 2007). Di-

mensionality reduction provides a way to summarize the population activity time course on

individual experimental trials by leveraging the statistical power across neurons (Yu et al.

2009). One can then study how the latent variables extracted by dimensionality reduction

change across time or across experimental conditions. Second, the multivariate statistical

structure in the population activity identiﬁed by dimensionality reduction may be indica-

tive of the neural mechanisms underlying various brain functions. For example, one study

suggested that a subject can imagine moving their arms, while not actually moving them,

when neural activity related to motor preparation lies in a space orthogonal to that related

to motor execution (Kaufman et al. 2014). Furthermore, the multivariate structure of pop-

ulation activity can help explain why some tasks are easier to learn than others (Sadtler

et al. 2014) and how subjects respond diﬀerently to the same stimulus in diﬀerent contexts

(Mante et al. 2013).

3.4.2. Correlated spiking activity at precise time scales. In principle, very large quantities of

information could be conveyed through the precise timing of spikes across groups of neurons.

The idea that the nervous system might be able to recognize such patterns of precise timing

is therefore an intriguing possibility (Abeles 1982; Geman 2006; Singer and Gray 1995).

However, it is very diﬃcult to obtain strong experimental evidence in favor of a widespread

computational role for precise timing (e.g., an accuracy within 1-5 milliseconds), beyond

the inﬂuence of the high arrival rate of synaptic impulses when multiple input neurons ﬁre

nearly synchronously. Part of the issue is experimental, because precise timing may play

an important role only in specialized circumstances, but part is statistical: under plausible

point process models, patterns such as nearly synchronous ﬁring will occur by chance,

and it may be challenging to deﬁne a null model that captures the null concept without

producing false positives. For example, when the ﬁring rates of two neurons increase, the

number of nearly synchronous spikes will increase even when the spike trains are otherwise

independent; thus, a null model with constant ﬁring rates could produce false positives

for the null hypothesis of independence. This makes the detection of behaviorally-relevant

spike patterns a subtle statistical problem (Gr¨un 2009; Harrison et al. 2013).

A strong indication that precise timing of spikes may be relevant to behavior came

from an experiment involving hand movement, during which pairs of neurons in motor

cortex ﬁred synchronously (within 5 milliseconds of each other) more often than predicted

by an independent Poisson process model and, furthermore, these events, called Unitary

Events, clustered around times that were important to task performance (Riehle et al. 1997).

While this illustrated the potential role of precisely timed spikes, it also raised the issue

of whether other plausible point process null models might lead to diﬀerent results. Much

work has been done to reﬁne this methodology (Albert et al. 2016; Gr¨un 2009; Torre et al.

2016). Related approaches replace the null assumption of independence with some order of

correlation, using marked Poisson processes (Staude et al. 2010).

There is a growing literature on dependent point processes. Some models do not include

a speciﬁc mechanism for generating precise spike timing, but can still be used as null models

for hypothesis tests of precise spike timing. On a coarse time scale, point process regression

models as in Equation (1) can incorporate eﬀects of one neuron’s spiking behavior on

another (Pillow et al. 2008; Truccolo 2010). On a ﬁne time scale, one may instead consider

multivariate binary processes (multiple sequences of 0s and 1s where 1s represent spikes). In

24 Kass et al.

the stationary case, a standard statistical tool for analyzing binary data involves loglinear

models (Agresti 1996), where the log of the joint probability of any particular pattern is

represented as a sum of terms that involve successively higher-order interactions, i.e., terms

that determine the probability of spiking within a given time bin for individual neurons,

pairs of neurons, triples, etc. Two-way interaction models, also called maximum entropy

models, which exclude higher than pairwise interactions, have been used in several studies

and in some cases higher-order interactions have been examined (Ohiorhenuan et al. 2010;

Santos et al. 2010; Shimazaki et al. 2015), sometimes using information geometry (Nakahara

et al. 2006), though large amounts of data may be required to ﬁnd small but plausibly

interesting eﬀects (Kelly and Kass 2012). Extensions to non-stationary processes have also

been developed (Shimazaki et al. 2012; Zhou et al. 2015). Dichotomized Gaussian models,

which instead produce binary outputs from threshold crossings of a latent multivariate

Gaussian random variable, have also been used (Amari et al. 2003; Shimazaki et al. 2015),

as have Hawkes processes (Jovanovi´c et al. 2015). A variety of correlation structures may

be accommodated by analyzing cumulants (Staude et al. 2010).

To test hypotheses about precise timing, several authors have suggested procedures akin

to permutation tests or nonparametric bootstrap. The idea is to generate re-sampled data,

also called pseudo-data or surrogate data, that preserves as many of the features of the

original data as possible, but that lacks the feature of interest, such as precise spike timing.

A simple case, called dithering or jittering, modiﬁes the precise time of each spike by some

random amount within a small interval, thereby preserving all coarse temporal structure

and removing all ﬁne temporal structure. Many variations on this theme have been explored

(Gr¨un 2009; Harrison et al. 2013; Platkiewicz et al. 2017), and connections have been made

with the well-established statistical notion of conditional inference (Harrison et al. 2015).

3.4.3. Reinforcement learning. Reinforcement learning (RL) grew from attempts to de-

scribe mathematically the way organisms learn in order to achieve repeatedly-presented

goals. The motivating idea was spelled out in 1911 by Thorndike (Thorndike 1911, p. 244):

when a behavioral response in some situation leads to reward (or discomfort) it becomes

associated with that reward (or discomfort), so that the behavior becomes a learned re-

sponse to the situation. While there were important precursors (Bush and Mosteller 1955;

Rescorla and Wagner 1972), the basic theory reached maturity with the 1998 publication

of the book by Sutton and Barto (Sutton and Barto 1998). Within neuroscience, a key

discovery involved the behavior of dopamine neurons in certain tasks: they initially ﬁre in

response to a reward but, after learning, ﬁre in response to a stimulus that predicts reward;

this was consistent with predictions of RL (Schultz et al. 1997). (Dopamine is a neuromod-

ulator, meaning a substance that, when emitted from the synapses of neurons, modulates

the synaptic eﬀects of other neurons; a dopamine neuron is a neuron that emits dopamine;

dopamine is known to play an essential role in goal-directed behavior.)

In brief, the mathematical framework is that of a Markov decision process, which is

an action-dependent Markov chain (i.e., a stochastic process on a set of states where the

probability of transitioning from one state to the next is action-dependent) together with

rewards that depend on both state transition and action. When an agent (an abstract entity

representing an organism, or some component of its nervous system) reaches stationarity

after learning, the current value Vtof an action may be represented in terms of its future-

discounted expected reward:

Vt=E(Rt+γRt+1 +γ2Rt+2 +γ3Rt+3 +· · ·)

www.annualreviews.org •Computational Neuroscience 25

=E(Rt+γVt+1 )

where Rtis the reward at time t. Thus, to drive the agent toward this stationarity condition,

the current estimate of value ˆ

Vtshould be updated in such a way as to decrease the estimated

magnitude of E(Rt+γVt+1 )−Vt, which is known as the reward prediction error (RPE),

δt=ˆ

E(Rt+γVt+1 )−ˆ

Vt=rt+γˆ

Vt+1 −ˆ

Vt.

This is also called the temporal diﬀerence learning error. RL algorithms accomplish learning

by sequentially reducing the magnitude of the RPE. The essential interpretation of Schultz

et al. (1997), which remains widely inﬂuential, was that dopamine neurons signal RPE.

The RL-based description of the activity of dopamine neurons has been considered

one of the great success stories in computational neuroscience, operating at the levels of

computation and algorithm in Marr’s framework (see Section 1.1). A wide range of further

studies have elaborated the basic framework and taken on topics such as the behavior

of other neuromodulators; neuroeconomics; the distinction between model-based learning,

where transition probabilities are learned explicitly, and model-free learning; social behavior

and decision-making; and the role of time and internal models in learning (Dayan and

Nakahara 2017; Schultz 2015).

3.4.4. Bayesian inference. Although statistical methods based on Bayes’ Theorem now play

a major role in statistics, they were, until relatively recently, controversial (McGrayne 2011).

In neuroscience, Bayes’ Theorem has been used in many theoretical constructions in part

because the brain must combine prior knowledge with current data somehow, and also

because evolution may have led to neural network behavior that is, like Bayesian inference

(under well speciﬁed conditions), optimal, or nearly so. Bayesian inference has played a

prominent role in theories of human problem-solving (Anderson 2009), visual perception

(Geisler 2011), sensory and motor integration (K¨ording 2007; Wolpert et al. 2011), and

general cortical processing (Griﬃths et al. 2012).

3.4.5. Deep learning. Deep learning (LeCun et al. 2015) is an outgrowth of PDP modeling

(see Section 1.4). Two major architectures came out of the 1980’s and 1990’s, convolu-

tional neural networks (CNNs) and long short term memory (LSTM). LSTM (Hochreiter

and Schmidhuber 1997) enables neural networks to take as input sequential data of arbi-

trary length and learn long-term dependencies by incorporating a memory module where

information can be added or forgotten according to functions of the current input and state

of the system. CNNs, which achieve state of the art results in many image classiﬁcation

tasks, take inspiration from the visual system by incorporating receptive ﬁelds and en-

forcing shift-invariance (physiological visual object recognition being invariant to shifts in

location). In deep learning architectures, receptive ﬁelds (LeCun et al. 2015) identify a very

speciﬁc input pattern, or stimulus, in a small spatial region, using convolution to combine

inputs. Receptive ﬁelds induce sparsity and lead to signiﬁcant computational savings, which

prompted early success with CNNs (LeCun 1989). Shift invariance is achieved through a

spatial smoothing operator known as pooling (a weighted average, or often the maximum

value, over a local neighborhood of nodes). Because it introduces redundancies, pooling

is often combined with downsampling. Many layers, each using convolution and pooling,

are stacked to create a deep network, in rough analogy to multiple anatomical layers in

the visual system of primates. Although artiﬁcial neural networks had largely fallen out of

26 Kass et al.

widespread use by the end of the 1990s, faster computers combined with availability of very

large repositories of training data, and the innovation of greedy layer-wise training (Bengio

et al. 2007) brought large gains in performance and renewed attention, especially when

ALEXNET (Krizhevsky et al. 2012) was applied to the ImageNet database (Deng et al.

2009). Rapid innovation has enabled the application of deep learning to a wide variety of

problems of increasing size and complexity.

The success of deep learning in reaching near human-level performance on certain highly

constrained prediction and classiﬁcation tasks, particularly in the area of computer vision,

has inspired interest in exploring the connections between deep neural networks and the

brain. Studies have shown similarities between the internal representations of convolu-

tional neural networks and representations in the primate visual system (Kriegeskorte 2015;

Yamins and DiCarlo 2016). Furthermore, the biological phenomenon of hippocampal replay

during memory consolidation prompted innovation in artiﬁcial intelligence, in part through

the incorporation of reinforcement learning (see Section 3.4.3) into deep learning architec-

tures (Mnih et al. 2015). On the other hand, some studies have shown cases in which

biological vision and deep networks diverge in performance (Nguyen et al. 2015; Ullman

et al. 2016). Even though they are not biologically realistic, deep learning architectures

may suggest new scientiﬁc hypotheses (Pelillo et al. 2015).

3.5. Connecting mathematical and statistical approaches in large networks

3.5.1. Bridging from dynamical to statistical models of neural spiking. In Section 2.4 we

made an explicit connection between an integrated form of LIF models and GLMs. An

alternative is to derive from a mechanistic model, ﬁrst, an instantaneous intensity by deter-

mining mean activity and, second, the variation around the mean. In binary models, the

ﬁrst step leads to a Gaussian integral (Van Vreeswijk and Sompolinsky 1998) and the second

to its derivative (Helias et al. 2014; Renart et al. 2010). For spiking models, these steps

are conceptually identical, but mathematically more involved. The ﬁring rate follows from

the mean ﬁrst passage time for the membrane voltage to exceed the threshold (Amit and

Brunel 1997; Tuckwell 1988). Computing deviations of responses from the mean requires

either perturbation theory applied to the Fokker-Planck equation (Richardson 2008) or sep-

aration of timescales for slow currents (Moreno-Bote and Parga 2010). These approaches

may be united in an elegant framework to produce an equivalent GLM model (Ostojic

and Brunel 2011b). Approximating the ﬂuctuations in spiking and binary networks up to

linear order, correlations are equivalent to those of linear stochastic diﬀerential equations

driven by Gaussian noise (Grytskyy et al. 2013). Extensions treat the mechanistic origins of

stimulus adaptation in statistical models of neural responses (Famulare and Fairhall 2010).

3.5.2. Multivariate relationships via latent variable models. An important question is

whether mechanistic models can reproduce features of recorded neural activity that go

beyond population means and variances. This is especially challenging when, as is usually

the case, recorded neurons represent only a very small sample from a vast network. Sim-

ple summary statistics, such as the variability of the activity of individual neurons or the

correlation between pairs of neurons, can be a helpful ﬁrst step (Litwin-Kumar and Doiron

2012). A natural next step is to examine summaries based on dimensionality reduction,

as in Section 3.4.1, where the same multivariate statistical methods are applied to both

the activity produced by the model and to the data. For example, spontaneous activity

www.annualreviews.org •Computational Neuroscience 27

recorded in the primary visual cortex has been found to be more like activity produced by a

spiking network model having clustered connections than that produced by a network with

uniform random connectivity (Williamson et al. 2016).

Mechanistic models can also help in characterizing the statistical tools used to study

neural population activity by providing ground truth with which to judge performance of

statistical methods (Williamson et al. 2016). This includes determination of the amount

of data needed in order to identify particular eﬀects. From results outlined in Section 2.4,

when LIF models are used these ground truth data sets should be very similar to others

generated using GLM neurons (Zaytsev et al. 2015), and it is a topic for future research to

take advantage of this relationship.

4. Outlook

In addition to providing readers with an entry into the mathematical and statistical liter-

ature in computational neuroscience, we have also tried to highlight places where the two

approaches go hand in hand, especially in Sections 2.4, 2.5, 2.6, and 3.5. Another con-

crete example of this interplay comes from anesthesia, where highly structured oscillations,

readily visible in the EEG, change in a systematic way, depending on the dose of a given

anesthetic and the molecular targets and neural circuits where the anesthetic acts (Brown

et al. 2011). One of the most widely used anesthetics, propofol, acts at multiple sites in the

brain to enhance the activity of inhibitory neurons resulting initially in beta oscillations

(13 - 25 Hz) followed within seconds by slow-delta oscillations (0.1 - 4 Hz), and then a

combination of slow-delta oscillations with alpha oscillations (8 - 12 Hz) when the patient is

unconscious. Multitaper spectral time series analysis showed that the alpha oscillations are

highly coherent across the front of the scalp, and this was explained by a circuit model using

Hodgkin-Huxley neurons (Ching et al. 2010; Cimenser et al. 2011). Because all anesthetics

create similar oscillations, the combination of careful statistical analysis and mechanistic

modeling may be used to investigate the way other anesthetics create altered brain states.

As this example illustrates, computational neuroscience, like experimental neuroscience,

aims to improve knowledge about the functioning of the nervous system. On the one hand,

the statistical approach helps by introducing methods to summarize nervous system data.

On the other hand, mathematical theory helps by introducing frameworks for describing

nervous system behavior. Because both sides of computational neuroscience aim to build

understanding from data, they complement each other: mechanistic models reﬁne scientiﬁc

questions, and can thereby guide development of statistical methods; statistical methods

can ﬁnd important features of data, and can suggest directions for modeling eﬀorts. As the

ﬁeld tackles additional complexity in modeling and data analysis, it will become increasingly

important for researchers in computational neuroscience to be cognizant of the essential

ideas, tools, and approaches of both domains.

DISCLOSURE STATEMENT

The authors are not aware of any aﬃliations, memberships, funding, or ﬁnancial holdings

that might be perceived as aﬀecting the objectivity of this review.

28 Kass et al.

ACKNOWLEDGMENTS

This article was initiated during a workshop in October 2015 with support from the National

Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical

Sciences Institute. Additional conceptualization resulted from a second workshop in June,

2016, with support from the U.S.-Japan Brain Research Cooperative Program via NIMH

grant MH064537, NSF grant DMS 1612914, and the Japan Society for the Promotion

Science. Any opinions, ﬁndings, and conclusions or recommendations expressed in this

material are those of the authors and do not necessarily reﬂect the views of these funding

agencies. Additional work of individual authors was supported by individual research grants.

LITERATURE CITED

Abbott LF. 1999. Lapicque’s introduction of the integrate-and-ﬁre model neuron (1907). Brain

research bulletin 50:303–304

Abeles M. 1982. Role of the cortical neuron: integrator or coincidence detector? Israel journal of

medical sciences 18:83–92

Adrian ED, Zotterman Y. 1926. The impulses produced by sensory nerve endings. The Journal of

Physiology 61:465–483

Agresti A. 1996. Categorical data analysis, volume 990. New York: John Wiley & Sons

Albert M, Bouret Y, Fromont M, Reynaud-Bouret P. 2016. Surrogate data methods based on a

shuﬄing of the trials for synchrony detection: the centering issue. Neural Comput. 28:2352–2392

Aljadeﬀ J, Lansdell BJ, Fairhall AL, Kleinfeld D. 2016. Analysis of neuronal spike trains, decon-

structed. Neuron 91:221 – 259

Amarasingham A, Geman S, Harrison MT. 2015. Ambiguity and nonidentiﬁability in the statistical

analysis of neural codes. Proc. Natl. Acad. Sci. U. S. A. 112:6455–6460

Amari SI. 1977a. Dynamics of pattern formation in lateral-inhibition type neural ﬁelds. Biological

Cybernetics 27:77–87

Amari SI. 1977b. Neural theory of association and concept-formation. Biological Cybernetics

26:175–185

Amari SI, Nakahara H, Wu S, Sakai Y. 2003. Synchronous ﬁring and higher-order interactions in

neuron pool. Neural Comput. 15:127–142

Amit DJ, Brunel N. 1997. Model of global spontaneous activity and local structured activity during

delay periods in the cerebral cortex. Cerebral Cortex 7:237–252

Amit DJ, Gutfreund H, Sompolinsky H. 1987. Information storage in neural networks with low

levels of activity. Phys. Rev. A Gen. Phys. 35:2293–2303

Anderson JR. 2009. How can the human mind occur in the physical universe? Oxford University

Press

Bailey DL, Townsend DW, Valk PE, Maisey MN. 2005. Positron emission tomography. Springer

Bassett DS, Bullmore ET. 2016. Small-World brain networks revisited. Neuroscientist

Beggs JM, Plenz D. 2003. Neuronal avalanches in neocortical circuits. J. of Nuerosci. 23:11167–

11177

Bengio Y, Lamblin P, Popovici D, Larochelle H. 2007. Greedy Layer-Wise training of deep networks.

In PB Sch¨olkopf, JC Platt, T Hoﬀman, editors, Advances in Neural Information Processing

Systems 19, pages 153–160. MIT Press

Boole G. 1854. An Investigation of the Laws of Thought on which are Founded the Mathematical

Theories of Logic and Probabilities. Walton and Maberly

Bos H, Diesmann M, Helias M. 2016. Identifying anatomical origins of coexisting oscillations in the

cortical microcircuit. PLOS Computational Biology 12:1–34

Bressloﬀ PC. 2012. Spatiotemporal dynamics of continuum neural ﬁelds. Journal of Physics A:

Mathematical and Theoretical 45

www.annualreviews.org •Computational Neuroscience 29

Bressloﬀ PC, Cowan JD, Golubitsky M, Thomas PJ, Wiener MC. 2001. Geometric visual hallu-

cinations, euclidean symmetry and the functional architecture of striate cortex. Philosophical

Transactions of the Royal Society B 356:299–330

Brown EN, Purdon PL, Van Dort CJ. 2011. General anesthesia and altered states of arousal: a

systems neuroscience analysis. Annual review of neuroscience 34:601–628

Brunel N. 2000. Dynamics of sparsely connected networks of excitatory and inhibitory spiking

neurons. Journal of Computational Neuroscience 8:183–208

Brunel N, Van Rossum MC. 2007. Lapicques 1907 paper: from frogs to integrate-and-ﬁre. Biological

cybernetics 97:337–339

Bullmore E, Sporns O. 2009. Complex brain networks: graph theoretical analysis of structural and

functional systems. Nat. Rev. Neurosci. 10:186–198

Bush RR, Mosteller F. 1955. Stochastic models for learning. John Wiley & Sons, Inc.

Buzs´aki G, Mizuseki K. 2014. The log-dynamic brain: how skewed distributions aﬀect network

operations. Nature Reviews Neuroscience 15:264–278

Cain N, Shea-Brown E. 2012. Computational models of decision making: integration, stability, and

noise. Current opinion in neurobiology 22:1047–1053

Carlson DE, Vogelstein JT, Wu Q, Lian W, Zhou M, Stoetzner CR, Kipke D, Weber D, Dunson

DB, Carin L. 2014. Multichannel electrophysiological spike sorting via joint dictionary learning

and mixture modeling. IEEE Transactions on Biomedical Engineering 61:41–54

Ching S, Cimenser A, Purdon PL, Brown EN, Kopell NJ. 2010. Thalamocortical model for a

propofol-induced α-rhythm associated with loss of consciousness. Proceedings of the National

Academy of Sciences 107:22665–22670

Churchland MM, Yu BM, Sahani M, Shenoy KV. 2007. Techniques for extracting single-trial activity

patterns from large-scale neural recordings. Curr. Opin. Neurobiol. 17:609–618

Cimenser A, Purdon PL, Pierce ET, Walsh JL, Salazar-Gomez AF, Harrell PG, Tavares-Stoeckel

C, Habeeb K, Brown EN. 2011. Tracking brain states under general anesthesia by using global

coherence analysis. Proceedings of the National Academy of Sciences 108:8832–8837

Cohen MR, Kohn A. 2011. Measuring and interpreting neuronal correlations. Nat. Neurosci.

14:811–819

Colquhoun D, Sakmann B. 1998. From muscle endplate to brain synapses: a short history of

synapses and agonist-activated ion channels. Neuron 20:381–387

Craik K. 1943. The nature of explanation. Cambridge University, Cambridge UK

Cunningham JP, Yu BM. 2014. Dimensionality reduction for large-scale neural recordings. Nat.

Neurosci. 17:1500–1509

Dayan P, Abbott LF. 2001. Theoretical neuroscience, volume 806. Cambridge, MA: MIT Press

Dayan P, Nakahara H. 2017. Reconstruction of recurrent synaptic connectivity of thousands of

neurons from simulated spiking activity. accepted

De La Rocha J, Doiron B, Shea-Brown E, Josi´c K, Reyes A. 2007. Correlation between neural spike

trains increases with ﬁring rate. Nature 448:802–806

Deger M, Schwalger T, Naud R, Gerstner W. 2014. Fluctuations and information ﬁltering in coupled

populations of spiking neurons with adaptation. Physical Review E 90:062704

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. 2009. ImageNet: A large-scale hierarchical

image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages

248–255

Destexhe A, Mainen ZF, Sejnowski TJ. 1994. Synthesis of models for excitable membranes, synaptic

transmission and neuromodulation using a common kinetic formalism. J. Comput. Neurosci.

1:195–230

Doiron B, Litwin-Kumar A, Rosenbaum R, Ocker GK, Josi´c K. 2016. The mechanics of state-

dependent neural correlations. Nature neuroscience 19:383–393

Doiron B, Rinzel J, Reyes A. 2006. Stochastic synchronization in ﬁnite size spiking networks.

Physical Review E 74:030903

30 Kass et al.

Ermentrout B, Terman DH. 2010. Foundations of mathematical neuroscience. Citeseer

Faisal AA, Selen LP, Wolpert DM. 2008. Noise in the nervous system. Nature Reviews Neuroscience

9:292–303

Famulare M, Fairhall A. 2010. Feature selection in simple neurons: how coding depends on spiking

dynamics. Neural Comput. 22:581–598

Fienberg SE. 2012. A brief history of statistical models for network analysis and open challenges.

J. Comput. Graph. Stat. 21:825–839

Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, Van Der Kouwe A, Killiany

R, Kennedy D, Klaveness S, et al. 2002. Whole brain segmentation: automated labeling of

neuroanatomical structures in the human brain. Neuron 33:341–355

Fitzhugh R. 1960. Thresholds and plateaus in the Hodgkin-Huxley nerve equations. J. Gen. Physiol.

43:867–896

Galvani L, Aldini G. 1792. De Viribus Electricitatis In Motu Musculari Comentarius Cum Joannis

Aldini Dissertatione Et Notis; Accesserunt Epistolae ad animalis electricitatis theoriam perti-

nentes. Apud Societatem Typographicam

Geisler WS. 2011. Contributions of ideal observer theory to vision research. Vision research 51:771–

781

Geman S. 2006. Invariance and selectivity in the ventral visual pathway. Journal of Physiology-Paris

100:212–224

Geman S, Geman D. 1984. Stochastic relaxation, gibbs distributions, and the bayesian restoration

of images. IEEE Trans. Pattern Anal. Mach. Intell. 6:721–741

Gerhard F, Kispersky T, Gutierrez GJ, Marder E, Kramer M, Eden U. 2013. Successful reconstruc-

tion of a physiological circuit with known connectivity from spiking activity alone. PLoS Comput

Biol 9:e1003138

Gerstein GL, Mandelbrot B. 1964. Random walk models for the spike activity of a single neuron.

Biophys. J. 4:41–68

Gerstner W, Kistler WM, Naud R, Paninski L. 2014. Neuronal dynamics: From single neurons to

networks and models of cognition. Cambridge University Press

Gerstner W, Naud R. 2009. How good are neuron models? Science 326:379–380

Ginzburg I, Sompolinsky H. 1994. Theory of correlations in stochastic neural networks. Phys Rev

E50:3171–3191

Goedeke S, Diesmann M. 2008. The mechanism of synchronization in feed-forward neuronal net-

works. New Journal of Physics 10:015007

Gold JI, Shadlen MN. 2007. The neural basis of decision making. Annu. Rev. Neurosci. 30:535–574

Grienberger C, Konnerth A. 2012. Imaging calcium in neurons. Neuron 73:862–885

Griﬃths TL, Chater N, Norris D, Pouget A. 2012. How the bayesians got their beliefs (and what

those beliefs actually are): comment on bowers and davis (2012).

Grillner S, Jessell TM. 2009. Measured motion: searching for simplicity in spinal locomotor net-

works. Current opinion in neurobiology 19:572–586

Gr¨un S. 2009. Data-driven signiﬁcance estimation for precise spike correlation. Journal of Neuro-

physiology 101:1126–1140

Grytskyy D, Tetzlaﬀ T, Diesmann M, Helias M. 2013. A uniﬁed view on weakly correlated recurrent

networks. Frontiers in Computational Neuroscience 7:131

Gugerty L. 2006. Newell and simon’s logic theorist: Historical background and impact on cognitive

modeling. Proc. Hum. Fact. Ergon. Soc. Annu. Meet. 50:880–884

H¨am¨al¨ainen M, Hari R, Ilmoniemi RJ, Knuutila J, Lounasmaa OV. 1993. Magnetoencephalogra-

phytheory, instrumentation, and applications to noninvasive studies of the working human brain.

Reviews of modern Physics 65:413

Harrison MT, Amarasingham A, Kass RE. 2013. Statistical identiﬁcation of synchronous spiking.

Spike Timing: Mechanisms and Function page 77

Harrison MT, Amarasingham A, Truccolo W. 2015. Spatiotemporal conditional inference and

www.annualreviews.org •Computational Neuroscience 31

hypothesis tests for neural ensemble spiking precision. Neural Comput. 27:104–150

Hartline HK, Graham CH. 1932. Nerve impulses from single receptors in the eye. Journal of

Cellular Physiology 1:277–295

Hebb DO. 1949. The organization of behavior: A neuropsychological approach. John Wiley & Sons

Helias M, Tetzlaﬀ T, Diesmann M. 2013. Echoes in correlated neural systems. New Journal of

Physics 15:023002

Helias M, Tetzlaﬀ T, Diesmann M. 2014. The correlation structure of local cortical networks

intrinsically results from recurrent dynamics. PLOS Computational Biology 10:e1003428

Hertz J. 2010. Cross-correlations in high-conductance states of a model cortical network. Neural

Comput. 22:427–447

Hille B. 2001. Ionic Channels of Excitable Membranes. Sinauer

Hinton GE, Sejnowski TJ. 1983. Optimal perceptual inference. In Proceedings of the IEEE confer-

ence on Computer Vision and Pattern Recognition, pages 448–453

Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput. 9:1735–1780

Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application

to conduction and excitation in nerve. J. Physiol. 117:500–544

Hong S, y Arcas BA, Fairhall AL. 2007. Single neuron computation: from dynamical system to

feature detector. Neural Comput. 19:3133–3172

Hopﬁeld JJ. 1982. Neural networks and physical systems with emergent collective computational

abilities. Proc. Natl. Acad. Sci. U. S. A. 79:2554–2558

Hopﬁeld JJ. 1984. Neurons with graded response have collective computational properties like those

of two-state neurons. Proceedings of the National Academy of Sciences 81:3088–3092

Hubel DH, Wiesel TN. 1959. Receptive ﬁelds of single neurones in the cat’s striate cortex. The

Journal of physiology 148:574–591

Izhikevich EM. 2007. Dynamical systems in neuroscience. MIT press

Jovanovi´c S, Hertz J, Rotter S. 2015. Cumulants of hawkes point processes. Phys. Rev. E Stat.

Nonlin. Soft Matter Phys. 91:042802

Kandel ER, Schwartz JH, Jessell TM, Siegelbaum SA, Hudspeth AJ. 2013. Principles of neural

science. McGraw-hill New York, 5 edition

Kass RE, Eden UT, Brown EN. 2014. Analysis of Neural Data:. Springer Series in Statistics.

Springer New York

Kass RE, Ventura V. 2001. A spike-train probability model. Neural Comput. 13:1713–1720

Kaufman MT, Churchland MM, Ryu SI, Shenoy KV. 2014. Cortical activity in the null space:

permitting preparation without movement. Nat. Neurosci. 17:440–448

Kelly RC, Kass RE. 2012. A framework for evaluating pairwise and multiway synchrony among

stimulus-driven neurons. Neural Comput. 24:2007–2032

Kobayashi R, Tsubo Y, Shinomoto S. 2009. Made-to-order spiking neuron model equipped with a

multi-timescale adaptive threshold. Frontiers in computational neuroscience 3:9

Kopell N, Ermentrout G. 2002. Mechanisms of phase-locking and frequency control in pairs of

coupled neural oscillators. Handbook of dynamical systems 2:3–54

K¨ording K. 2007. Decision theory: What” should” the nervous system do? Science 318:606–610

Kriegeskorte N. 2015. Deep neural networks: A new framework for modeling biological vision and

brain information processing. Annual Review of Vision Science 1:417–446

Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classiﬁcation with deep convolutional

neural networks. In F Pereira, CJC Burges, L Bottou, KQ Weinberger, editors, Advances in

Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc.

Lansky P, Ditlevsen S. 2008. A review of the methods for signal estimation in stochastic diﬀusion

leaky integrate-and-ﬁre neuronal models. Biological cybernetics 99:253–262

Lapique L. 1907. Recherches quantitatives sur l’excitation electrique des nerfs traitee comme une

polarization. J. Physiol. Pathol. Gen. 9:620–635

Lazar N. 2008. The statistical analysis of functional MRI data. Springer Science & Business Media

32 Kass et al.

LeCun Y. 1989. Generalization and network design strategies. In R Pfeifer, Z Schreter, F Fogelman-

Souli´e, L Steels, editors, Connectionism in perspective, pages 143–155. Zurich, Switzerland: El-

sevier

LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521:436–444

Litwin-Kumar A, Doiron B. 2012. Slow dynamics and high variability in balanced cortical networks

with clustered connections. Nat. Neurosci. 15:1498–1505

Mainen ZF, Sejnowski TJ. 1995. Reliability of spike timing in neocortical neurons. Science 268:1503–

1506

Mante V, Sussillo D, Shenoy KV, Newsome WT. 2013. Context-dependent computation by recurrent

dynamics in prefrontal cortex. Nature 503:78–84

Marder E, Bucher D. 2001. Central pattern generators and the control of rhythmic movements.

Current biology 11:R986–R996

Markram H, Muller E, Ramaswamy S, Reimann MW, Abdellah M, Sanchez CA, Ailamaki A,

Alonso-Nanclares L, Antille N, Arsever S, Kahou GAA, Berger TK, Bilgili A, Buncic N, Chal-

imourda A, Chindemi G, Courcol JD, Delalondre F, Delattre V, Druckmann S, Dumusc R, Dynes

J, Eilemann S, Gal E, Gevaert ME, Ghobril JP, Gidon A, Graham JW, Gupta A, Haenel V, Hay

E, Heinis T, Hernando JB, Hines M, Kanari L, Keller D, Kenyon J, Khazen G, Kim Y, King JG,

Kisvarday Z, Kumbhar P, Lasserre S, Le B´e JV, Magalh˜aes BRC, Merch´an-P´erez A, Meystre J,

Morrice BR, Muller J, Mu˜noz-C´espedes A, Muralidhar S, Muthurasa K, Nachbaur D, Newton

TH, Nolte M, Ovcharenko A, Palacios J, Pastor L, Perin R, Ranjan R, Riachi I, Rodr´ıguez JR,

Riquelme JL, R¨ossert C, Sfyrakis K, Shi Y, Shillcock JC, Silberberg G, Silva R, Tauheed F,

Telefont M, Toledo-Rodriguez M, Tr¨ankler T, Van Geit W, D´ıaz JV, Walker R, Wang Y, Za-

ninetta SM, DeFelipe J, Hill SL, Segev I, Sch¨urmann F. 2015. Reconstruction and simulation of

neocortical microcircuitry. Cel l 163:456–492

Marr D. 1982. Vision: A computational approach

McClelland JL, Rumelhart DE. 1981. An interactive activation model of context eﬀects in letter

perception: I. an account of basic ﬁndings. Psychological review 88:375

McCulloch WS, Pitts W. 1943. A logical calculus of the ideas immanent in nervous activity. Bull.

Math. Biophys. 5:115–133

McGrayne SB. 2011. The theory that would not die: how Bayes’ rule cracked the enigma code,

hunted down Russian submarines, & emerged triumphant from two centuries of controversy. Yale

University Press

Medler DA. 1998. A brief history of connectionism. Neural Computing Surveys 1:18–72

Meliza CD, Kostuk M, Huang H, Nogaret A, Margoliash D, Abarbanel HD. 2014. Estimating

parameters and predicting membrane voltages with conductance-based neuron models. Biological

Cybernetics 108:495–516

Meng L, Kramer MA, Middleton SJ, Whittington MA, Eden UT. 2014. A uniﬁed approach to linking

experimental, statistical and computational analysis of spike train data. PloS one 9:e85269

Meyer C, van Vreeswijk C. 2002. Temporal correlations in stochastic networks of spiking neurons.

Neural Comput. 14:369–404

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M,

Fidjeland AK, Ostrovski G, et al. 2015. Human-level control through deep reinforcement learning.

Nature 518:529–533

Monteforte M, Wolf F. 2012. Dynamic ﬂux tubes form reservoirs of stability in neuronal circuits.

Phys. Rev. X page 041007

Moreno-Bote R, Parga N. 2010. Response of integrate-and-ﬁre neurons to noisy inputs ﬁltered by

synapses with arbitrary timescales: Firing rate and correlations. Neural Comput. 22:1528–1572

Nagumo J, Arimoto S, Yoshizawa S. 1962. An active pulse transmission line simulating nerve axon.

Proceedings of the IRE 50:2061–2070

Nakahara H, Amari Si, Richmond BJ. 2006. A comparison of descriptive models of a single spike

train by information-geometric measure. Neural Comput. 18:545–568

www.annualreviews.org •Computational Neuroscience 33

Newell A, Simon H. 1956. The logic theory machine–a complex information processing system.

IEEE Trans. Inf. Theory 2:61–79

Nguyen A, Yosinski J, Clune J. 2015. Deep neural networks are easily fooled: High conﬁdence

predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, pages 427–436

Nunez PL, Srinivasan R. 2006. Electric Fields of the Brain: The Neurophysics of EEG. Oxford

University Press, New York, NY

Ohiorhenuan IE, Mechler F, Purpura KP, Schmid AM, Hu Q, Victor JD. 2010. Sparse coding and

high-order correlations in ﬁne-scale cortical networks. Nature 466:617–621

Okun M, Lampl I. 2008. Instantaneous correlation of excitation and inhibition during ongoing and

sensory-evoked activities. Nature Neuroscience 11:535–537

Ostojic S, Brunel N. 2011a. From spiking neuron models to linear-nonlinear models. PLOS Com-

putational Biology 7:e1001056

Ostojic S, Brunel N. 2011b. From spiking neuron models to linear-nonlinear models. PLoS Comput

Biol 7:e1001056

Ostojic S, Brunel N, Hakim V. 2009. How connectivity, background activity, and synaptic properties

shape the cross-correlation between spike trains. J. of Nuerosci. 29:10234–10253

Paninski L, Brown EN, Iyengar S, Kass RE. 2009. Statistical models of spike trains. Stochastic

methods in neuroscience pages 278–303

Papo D, Zanin M, Mart´ınez JH, Buld´u JM. 2016. Beware of the Small-World neuroscientist! Front.

Hum. Neurosci. 10:96

Pelillo M, Scantamburlo T, Schiaﬀonati V. 2015. Pattern recognition between science and engineer-

ing: A red herring? Pattern Recognit. Lett. 64:3–10

Perkel DH, Bullock TH. 1968. Neural coding. Neurosci. Res. Program Bull.

Piccinini G. 2004. The ﬁrst computational theory of mind and brain: a close look at mcculloch and

pitts’s logical calculus of ideas immanent in nervous activity. Synthese 141:175–215

Piccolino M. 1998. Animal electricity and the birth of electrophysiology: the legacy of luigi galvani.

Brain Res. Bull. 46:381–407

Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chichilnisky EJ, Simoncelli EP. 2008. Spatio-

temporal correlations and visual signalling in a complete neuronal population. Nature 454:995–

999

Platkiewicz J, Stark E, Amarasingham A. 2017. Spike-centered jitter can mistake temporal struc-

ture. Neural Comput. 29:783–803

Pnevmatikakis EA, Soudry D, Gao Y, Machado TA, Merel J, Pfau D, Reardon T, Mu Y, Laceﬁeld

C, Yang W, et al. 2016. Simultaneous denoising, deconvolution, and demixing of calcium imaging

data. Neuron 89:285–299

Prinz AA, Bucher D, Marder E. 2004. Similar network activity from disparate circuit parameters.

Nature neuroscience 7:1345–1352

Qin F, Auerbach A, Sachs F. 1997. Maximum likelihood estimation of aggregated markov processes.

Proceedings of the Royal Society of London B: Biological Sciences 264:375–383

Rall W. 1962. Theory of physiological properties of dendrites. Ann. N. Y. Acad. Sci. 96:1071–1092

Renart A, De La Rocha J, Bartho P, Hollender L, Parga N, Reyes A, Harris KD. 2010. The

asynchronous state in cortical circuits. Science 327:587–590

Rescorla RA, Wagner AR. 1972. A theory of pavlovian conditioning: Variations in the eﬀectiveness

of reinforcement and nonreinforcement. In AH Black, WF Prokasy, editors, Classical conditioning

II: Current research and theory, volume 2, pages 64–99. New-York

Rey HG, Pedreira C, Quiroga RQ. 2015. Past, present and future of spike sorting techniques. Brain

research bulletin 119:106–117

Richardson MJE. 2008. Spike-train spectra and network response functions for non-linear integrate-

and-ﬁre neurons. Biological Cybernetics 99:381–392

Riehle A, Gr¨un S, Diesmann M, Aertsen A. 1997. Spike synchronization and rate modulation

34 Kass et al.

diﬀerentially involved in motor cortical function. Science 278:1950–1953

Rinzel J. 1985. Excitation dynamics: insights from simpliﬁed membrane models. In Fed. Proc,

volume 44, pages 2944–2946

Rosenblatt F. 1958. The perceptron: a probabilistic model for information storage and organization

in the brain. Psychol. Rev. 65:386–408

Rosenblueth A, Wiener N, Bigelow J. 1943. Behavior, purpose and teleology. Philos. Sci. 10:18–24

Rotstein HG. 2015. Subthreshold amplitude and phase resonance in models of quadratic type:

Nonlinear eﬀects generated by the interplay of resonant and amplifying currents. Journal of

computational neuroscience 38:325–354

Rotstein HG, Oppermann T, White JA, Kopell N. 2006. A reduced model for medial entorhinal

cortex stellate cells: subthreshold oscillations, spiking and synchronization. Journal of Compu-

tational Neuroscience 21:271–292

Roxin A, Brunel N, Hansel D. 2006. Rate Models with Delays and the Dynamics of Large Networks

of Spiking Neurons. Progress of Theoretical Physics Supplement 161:68–85

Roxin A, Brunel N, Hansel D, Mongillo G, van Vreeswijk C. 2011. On the distribution of ﬁring

rates in networks of cortical neurons. J. of Nuerosci. 31:16217–16226

Rumelhart DE, McClelland JL, PDP Research Group. 1986. Parallel distributed processing: Explo-

rations in the microstructures of cognition. Volume 1: Foundations. The MIT Press, Cambridge,

MA

Sadtler PT, Quick KM, Golub MD, Chase SM, Ryu SI, Tyler-Kabara EC, Yu BM, Batista AP.

2014. Neural constraints on learning. Nature 512:423–426

Sakmann B, Neher E. 1984. Patch clamp techniques for studying ionic channels in excitable mem-

branes. Annual review of physiology 46:455–472

Santos GS, Gireesh ED, Plenz D, Nakahara H. 2010. Hierarchical interaction structure of neural

activities in cortical slice cultures. J. of Nuerosci. 30:8720–8733

Schultz W. 2015. Neuronal reward and decision signals: from theories to data. Physiological Reviews

95:853–951

Schultz W, Dayan P, Montague PR. 1997. A neural substrate of prediction and reward. Science

275:1593–1599

Shadlen MN, Movshon JA. 1999. Synchrony unbound. Neuron 24:67–77

Shadlen MN, Newsome WT. 1998. The variable discharge of cortical neurons: Implications for

connectivity, computation, and information coding. J. of Nuerosci. 18:3870–3896

Shannon CE, Weaver W. 1949. The Mathematical Theory of Information. University of Illinois

Press

Shea-Brown E, Josic K, de la Rocha J, Doiron B. 2008. Correlation and synchrony transfer in

integrate-and-ﬁre neurons: basic properties and consequences for coding. Physical Review Letters

100:108102

Sherrington CS. 1897. The central nervous system. A Text-book of Physiology 3:60

Shimazaki H, Amari SI, Brown EN, Gr¨un S. 2012. State-space analysis of time-varying higher-order

spike correlation for multiple neural spike train data. PLoS Comput. Biol. 8:e1002385

Shimazaki H, Sadeghi K, Ishikawa T, Ikegaya Y, Toyoizumi T. 2015. Simultaneous silence organizes

structured higher-order interactions in neural populations. Scientiﬁc reports 5

Sigworth F. 1977. Sodium channels in nerve apparently have two conductance states. Nature

270:265–267

Sigworth F. 1980. The variance of sodium current ﬂuctuations at the node of ranvier. The Journal

of physiology 307:97

Singer W. 1999. Neuronal synchrony: a versatile code for the deﬁnition of relations? Neuron

24:49–65, 111–25

Singer W, Gray CM. 1995. Visual feature integration and the temporal correlation hypothesis.

Annual review of neuroscience 18:555–586

Somjen GG. 2004. Ions in the brain: normal function, seizures, and stroke. Oxford University

www.annualreviews.org •Computational Neuroscience 35

Press

Staude B, Rotter S, Gr¨un S. 2010. Cubic: cumulant based inference of higher-order correlations in

massively parallel spike trains. Journal of computational neuroscience 29:327–350

Stigler SM. 1986. The history of statistics: The measurement of uncertainty before 1900. Harvard

University Press

Sutton RS, Barto AG. 1998. Reinforcement learning: An introduction, volume 1. MIT press

Cambridge

Swanson LW. 2012. Brain architecture: understanding the basic plan. Oxford University Press

Teramae JN, Tsubo Y, Fukai T. 2012. Optimal spike-based communication in excitable networks

with strong-sparse and weak-dense links. Sci. Rep. 2:485

Tetzlaﬀ T, Helias M, Einevoll G, Diesmann M. 2012. Decorrelation of neural-network activity by

inhibitory feedback. PLOS Computational Biology 8:e1002596

Thorndike EL. 1911. Animal intelligence: Experimental studies. Macmillan

Tien JH, Guckenheimer J. 2008. Parameter estimation for bursting neural models. Journal of

Computational Neuroscience 24:358–373

Torre E, Quaglio P, Denker M, Brochier T, Riehle A, Gr¨un S. 2016. Synchronous spike patterns in

macaque motor cortex during an instructed-delay reach-to-grasp task. J. of Nuerosci. 36:8329–

8340

Tranchina D. 2010. Population density methods in large-scale neural network modelling. In Stochas-

tic Methods in Neuroscience. Oxford University Press

Traub RD, Contreras D, Cunningham MO, Murray H, LeBeau FEN, Roopun A, Bibbig A, Wilent

WB, Higley MJ, Whittington MA. 2005. Single-column thalamocortical network model exhibiting

gamma oscillations, sleep spindles, and epileptogenic bursts. J. Neurophysiol. 93:2194–2232

Trousdale J, Hu Y, Shea-Brown E, Josic K. 2012. Impact of network structure and cellular response

on spike time correlations. PLOS Computational Biology 8:e1002408

Truccolo W. 2010. Stochastic models for multivariate neural point processes: Collective dynamics

and neural decoding. In Analysis of parallel spike trains, pages 321–341. Springer

Tuckwell HC. 1988. Introduction to Theoretical Neurobiology, volume 2. Cambridge University

Press, Cambridge

Turing AM. 1937. On computable numbers, with an application to the entscheidungsproblem. Proc.

Lond. Math. Soc. 2:230–265

Ullman S, Assif L, Fetaya E, Harari D. 2016. Atoms of recognition in human and computer vision.

Proc. Natl. Acad. Sci. U. S. A. 113:2744–2749

van Vreeswijk C, Sompolinsky H. 1996. Chaos in neuronal networks with balanced excitatory and

inhibitory activity. Science 274:1724–1726

Van Vreeswijk C, Sompolinsky H. 1998. Chaotic balanced state in a model of cortical circuits.

Neural Comput. 10:1321–1371

Vavoulis DV, Straub VA, Aston JA, Feng J. 2012. A self-organizing state-space-model approach

for parameter estimation in hodgkin-huxley-type models of single neurons. PLoS Comput Biol

8:e1002401

Ventura V, Todorova S. 2015. A computationally eﬃcient method for incorporating spike waveform

information into decoding algorithms. Neural Comput.

Villringer A, Planck J, Hock C, Schleinkofer L, Dirnagl U. 1993. Near infrared spectroscopy (nirs):

a new tool to study hemodynamic changes during activation of brain function in human adults.

Neuroscience letters 154:101–104

Walch OJ, Eisenberg MC. 2016. Parameter identiﬁability and identiﬁable combinations in general-

ized hodgkin–huxley models. Neurocomputing 199:137–143

Wang W, Tripathy SJ, Padmanabhan K, Urban NN, Kass RE. 2015. An empirical model for reliable

spiking activity. Neural Comput.

Watts DJ, Strogatz SH. 1998. Collective dynamics of small-worldnetworks. Nature 393:440–442

Weber AI, Pillow JW. 2016. Capturing the dynamical repertoire of single neurons with generalized

36 Kass et al.

linear models. arXiv preprint arXiv:1602.07389

Wei Y, Ullah G, Schiﬀ SJ. 2014. Uniﬁcation of neuronal spikes, seizures, and spreading depression.

J. Neurosci. 34:11733–11743

Whitehead AN, Russell B. 1912. Principia Mathematica. University Press

Wiener N. 1948. Cybernetics: Control and communication in the animal and the machine. Wiley

New York

Williamson RC, Cowley BR, Litwin-Kumar A, Doiron B, Kohn A, Smith MA, Yu BM. 2016. Scaling

properties of dimensionality reduction for neural populations and network models. PLoS Comput.

Biol. 12:e1005141

Wilson HR, Cowan JD. 1972. Excitatory and inhibitory interactions in localized populations of

model neurons. Biophysical Journal 12:1 – 24

Wolpert DM, Diedrichsen J, Flanagan JR. 2011. Principles of sensorimotor learning. Nature Reviews

Neuroscience 12:739–751

Yamins DLK, DiCarlo JJ. 2016. Using goal-driven deep learning models to understand sensory

cortex. Nat. Neurosci. 19:356–365

Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. 2009. Gaussian-process

factor analysis for low-dimensional single-trial analysis of neural population activity. J. Neuro-

physiol. 102:614–635

Zaytsev YV, Morrison A, Deger M. 2015. Reconstruction of recurrent synaptic connectivity of

thousands of neurons from simulated spiking activity. Journal of computational neuroscience

39:77–103

Zhou P, Burton SD, Snyder AC, Smith MA, Urban NN, Kass RE. 2015. Establishing a statistical

link between network oscillations and neural synchrony. PLoS Comput Biol 11:e1004549

Zylberberg J, Cafaro J, Turner MH, Shea-Brown E, Rieke F. 2016. Direction-selective circuits shape

noise to ensure a precise population code. Neuron 89:369–383

www.annualreviews.org •Computational Neuroscience 37