Page 1

Neuron 52, 409–423, November 9, 2006 ª2006 Elsevier Inc.DOI 10.1016/j.neuron.2006.10.017

Viewpoint Optimal Information Storage

in Noisy Synapses under

Resource Constraints

Lav R. Varshney,1,2Per Jesper Sjo ¨stro ¨m,3

and Dmitri B. Chklovskii2,*

1Department of Electrical Engineering and Computer

Science

Massachusetts Institute of Technology

Cambridge, Massachusetts 02139

2Cold Spring Harbor Laboratory

Cold Spring Harbor, New York 11724

3Wolfson Institute for Biomedical Research and

Department of Physiology

University College London

London WC1E 6BT

United Kingdom

Summary

Experimental investigations have revealed that synap-

ses possess interesting and, in some cases, unex-

pected properties. We propose a theoretical frame-

work that accounts for three of these properties:

typical central synapses are noisy, the distribution of

synaptic weights among central synapses is wide,

and synaptic connectivity between neurons is sparse.

We also comment on the possibility that synaptic

weights may vary in discrete steps. Our approach is

based on maximizing information storage capacity of

neural tissue under resource constraints. Based on

previous experimental and theoretical work, we use

volume as a limited resource and utilize the empirical

relationship between volume and synaptic weight. So-

lutions of our constrained optimization problems are

not only consistent with existing experimental mea-

surements but also make nontrivial predictions.

Introduction

As synapses play central roles in the two principal tasks

of the brain, information processing and information

storage (Ramo ´n y Cajal, 1899), their properties have

been the subject of extensive experimentation. Out of

many important synaptic properties revealed over the

years, we focus on the following three. First, synaptic

connectivity is sparse not only in the brain in general,

but also in local circuits. In other words, the probability

of finding a synaptic connection between a randomly

chosen pair ofexcitatory neurons, even nearby neurons,

is much less than 1 (Holmgren et al., 2003; Isope and

Barbour, 2002; Markram, 1997; Markram et al., 1997;

Mason et al., 1991; Song et al., 2005; Thomson and

Bannister, 2003; Thomson et al., 2002). Second, typical

central synapses are noisy devices. Due, for example,

to probabilistic transmitter release, firing of the presyn-

aptic neuron occasionally fails to evoke an excitatory

postsynaptic potential (EPSP). Moreover, the amplitude

of the EPSP varies from trial to trial (Allen and Stevens,

1994; Hessler et al., 1993; Isope and Barbour, 2002;

Mason et al., 1991; Raastad et al., 1992; Rosenmund

et al., 1993; Sayer et al., 1990). Third, while the majority

of synaptic weights are relatively weak (mean EPSP <

1 mV), the weight distribution is broad with a notable

tailofstrongerconnections(Holmgrenetal.,2003;Isope

and Barbour, 2002; Markram et al., 1997; Mason et al.,

1991; Sayer et al., 1990; Sjo ¨stro ¨m et al., 2001).

Although a unified theoretical framework capable of

accounting for all of these properties does not exist,

several of these properties have been addressed previ-

ously. The noisiness of typical central synapses seemed

particularly puzzling since synapses act as conduits of

information between neurons (Koch, 1999). Several the-

oretical studies have considered the impact of synaptic

noise on information transmission through a synapse,

generally in the context of sensory processing (Gold-

man, 2004; Levy and Baxter, 2002; Manwani and Koch,

2000; Zador, 1998). These papers have shown that, un-

der some conditions or under some constraints, synap-

tic noisiness facilitates the efficiency of information

transmission. Moreover, Laughlin et al. (1998) have

pointed out that splitting information and transmitting

it over several less reliable but metabolically cheaper

channels reduces energy requirements. Adding infor-

mation channels invokes costs associated with building

and maintaining those channels (Levy and Baxter, 1996;

Schreiber et al., 2002), which must also be taken into ac-

count (Sarpeshkar, 1998). In a separate line of inquiry,

Brunel et al. (2004) explain the sparseness of synaptic

connectivity and the distribution of synaptic weights in

the cerebellum by maximizing the storage capacity of

a perceptron network.

Here,weaccountforseveralpropertiesofcentralsyn-

apses by developing a theoretical framework based on

the role of synapses as mechanisms of information stor-

age, rather than their dual role in transmitting informa-

tion between neurons. It is widely believed that long-

termmemoriesarerecordedinneuronalcircuitsthrough

alteration in the strength of existing synapses (Hebb,

1949; McGaugh, 2000), through long-term potentiation

(LTP) and long-term depression (LTD) (Lynch, 2004;

Morris, 2003). Memories are retrieved by electrical activ-

ity of neurons that ‘‘reads out’’ the pattern of synaptic

connectivity between them. Thus a synaptic memory

systemcanbeviewedasacommunicationchannelfrom

the present to the future (Figure 1). Although information

storage is well recognized as a case of a general com-

munication system (Csisza ´r and Ko ¨rner, 1997; Eldridge,

1963; Immink et al., 1998) and information theory has

been successfully applied in neuroscience (Rieke et al.,

1997), the application of information theory to the analy-

sis of synapses as memory elements has received little

attention.

Our theoretical analysis is based on maximizing infor-

mation storage capacity of synapses under resource

constraints. Generally, information storage capacity of

a system depends on the signal-to-noise ratio (SNR);

in the case of synapses, this is a ratio between average

synaptic weight and average noise. It would seem that

the best strategy for increased information storage

*Correspondence: mitya@cshl.org

Page 2

capacity would be to increase the synapse SNR; how-

ever, this increase in SNR comes at a cost. For given

noise, increasing the SNR requires increasing the aver-

age synaptic weight. But the weight of an individual syn-

aptic contact is positively correlated with its volume

(comprising the combined volume of the spine head

and axonal bouton) (Kasai et al., 2003; Lisman and Har-

ris, 1993; Matsuzaki et al., 2001; Murthy et al., 2001;

Nusser et al., 1998; Schikorski and Stevens, 1997; Ta-

kumi et al., 1999; Tanaka et al., 2005). As the weight of

a synaptic connection is composed of the weights of in-

dividual synaptic contacts and its volume is the sum of

contact volume, the correlation between the weight

and the volume should hold for the synaptic connection

as a whole. The volume, however, is a costly resource

(Cherniak et al., 1999; Chklovskii, 2004; Hsu et al.,

1998;Mitchison,1991;RamonyCajal,1899).Thus,infor-

mation storage capacity should be maximized under

volume constraint.

Here, we cast the problem of long-term memory into

the framework of information theory and deduce struc-

tural and connectivity properties of synapses that lead

tooptimalperformance.Optimalperformanceisdefined

as the maximization of the brain’s information storage

capacity under constrained cost, quantified by synaptic

volume. Note that we consider memory storage from

a physical perspective, looking at the information stor-

age density of neural tissue. Other than presupposing

that volume is a constrained resource, our approach re-

quiresnospecificassumptionsregardingthenetworkit-

self, such as a particular network organization or certain

activity patterns. Previous work, however, examined the

memory storage capacity of particular neural network

models (Brunel et al., 2004; Gardner, 1987; McEliece

et al., 1987; Newman, 1988; Rolls and Treves, 1998).

Thatapproachisdifferent,sinceitassumesspecificnet-

work designs and properties, thereby providing results

that are in part due to the a priori assumptions.

The paper is organized as follows. In section I, we

specify our model and formalize the empirical relation-

ship between synaptic weight and synaptic volume.

This preliminary step allows a quantitative tradeoff be-

tween storage capacity and cost. In section II, we show

that synaptic connections should, on average, have

smallvolumeand consequently benoisy tomaximizein-

formation storage capacity per unit volume. In section

III, we determine the distribution of synaptic weights—

not just the average structural property of synapses—

that maximizes capacity for a particular synapse model.

This optimal distribution includes many zero-weight

connections, or potential synapses, which is in accor-

dance with experimental observations of sparse synap-

tic connectivity. In section IV, we use an experimentally

determined distribution of synaptic weights, as well as

synaptic noise, to compute the cost function for which

the information storage system operates optimally.

Thesolutiontothisinverseproblemmakesmoreprecise

the synaptic weight cost relationship specified in sec-

tionI.InsectionV,wearguethatdiscretesynapticstates

may perform almost optimally or, in a slightly different

yet naturally constrained model, better than continuous

valued synaptic states. Finally, section VI compares our

theoretical predictions with known experimental data

and suggests further tests of the theory.

Results

I. Relationship between Synaptic Weight

and Volume

We start by formulating the model of a synapse in the in-

formation storage context, which is based on existing

experimental observations. Although synaptically con-

nected pairs of cortical neurons usually share multiple

synaptic contacts (Kalisman et al., 2005; Koester and

Johnston, 2005; Markram et al., 1997; Silver et al.,

2003),here,unlessspecifiedotherwise,werefertothese

contacts collectively as a synapse. Such a definition

is motivated by electrophysiological measurements,

which record synaptic weight of all the contacts

together.

We assume that information is stored in the synaptic

weight, A, of each synapse. The weight can be obtained

by averaging EPSP amplitude measured in multiple tri-

als in response to the firing of a presynaptic neuron.

Then the standard deviation of the EPSP amplitude

from trial to trial in a given synapse is the noise ampli-

tude, AN. As one might expect from the Poisson model

of synaptic release (Bekkers and Stevens, 1995; del

Castillo and Katz, 1954), the noise amplitude increases

sublinearly with the synaptic weight. Recent measure-

ments suggest a power law with the exponent about

0.38 (Markram et al., 1997; Song et al., 2005) (see

Figure S1 in the Supplemental Data available online).

The volume of a synapse is composed of individual

synaptic contact’s volume, which, in turn, correlates

with the contribution of each synaptic contact to synap-

ticweightassuggestedbythefollowingexperimentalre-

sults.Anatomically,individualsynapticcontact’svolume

correlates with many ultrastructural characteristics,

such as the number and area of active zones, number

Figure 1. Information Theoretic Models of Communication and Memory

(Left) Shannon’s schematic diagram of a general communication system (Shannon, 1948). Here, incoming information is denoted A, whereas the

information transmitted through and distorted by the channel is labeled B. (Right) Schematic diagram of a memory system cast as a communi-

cationsystem.Inthisview,informationisstoredintheinputvariableA,whiletheretrievedvalue—whichiscorruptedbythenoiseofthesystem—

is indicated by B. The various sources of noise have been explicitly notated. Storage noise refers to noise that arises in the storage process, in

situ noise refers to noise that perturbs the information while it is stored, and retrieval noise refers to noise in the retrieval process.

Neuron

410

Page 3

of vesicles, area of the postsynaptic density, and the

number of receptors (Lisman and Harris, 1993; Murthy

et al., 2001; Nusser et al., 1998; Pierce and Mendell,

1993; Schikorski and Stevens, 1997; Streichert and Sar-

gent,1989;Takumietal.,1999;Tanakaetal.,2005;Yeow

and Peterson, 1991).Physiologically, individualsynaptic

contact’s volume correlates with the synaptic weight

(Kasai et al., 2003; Matsuzaki et al., 2001). In fact, an in-

creaseinsynapticspinevolumemaysometimesaccom-

pany LTP (Matsuzaki et al., 2004; Kopec et al., 2006),

whereas LTD may result in the converse volume de-

crease (Zhou et al., 2004).

Because contributions of individual contacts to syn-

aptic weight may add up linearly (Cash and Yuste,

1999), the volume of a synapse correlates with the syn-

aptic weight. Indeed, a neuron can be viewed as a single

computational unit (Chklovskii et al., 2004), as there is

evidence that multiple synaptic contacts within a con-

nected pair ofneurons have correlated release probabil-

ity (Koester and Johnston, 2005) and that the total syn-

aptic connection weight correlates with the number of

synaptic contacts (Kalisman et al., 2005). Alternatively,

the integrative compartment may be smaller—such as

a single dendritic branch (Poirazi et al., 2003; Polsky

et al., 2004)—and individual synaptic contacts may in

addition vary their weights independently. In this case,

our model would have to be modified.

As the noise amplitude, AN, is related by a power law

to the mean EPSP amplitude, A, which is strongly corre-

lated with the synapse volume, V, we can formulate the

following scaling relationship:

?AV

VN=

AN

?a

(1.1)

where VNis the volume of a synapse with an SNR of 1.

Although existing experimental measurements (Kasai

et al., 2003; Matsuzaki et al., 2001; Murthy et al., 2001;

Schikorski and Stevens, 1997; Song et al., 2005; Takumi

et al., 1999; Tanaka et al., 2005) support Equation 1.1,

they are not sufficient to establish the value of the

exponent, a.

II. Noisy Synapses Maximize Information Storage

Capacity

In this section, we deduce optimal average synaptic

weight and volume by maximizing information storage

capacity per unit volume. We invoke the synaptic

weight/volume relationship formulated in the previous

section (Equation1.1) witha =2;other cases will becon-

sidered in later sections. For a = 2, the problem of max-

imizing information storage capacity in a given volume

reduces to the well-studied problem of maximizing

channel capacity for a given input power. When the

channel contributes additive white Gaussian noise

(AWGN), such a problem is exactly solvable.

Inour context, information isstored inthe alteration of

synaptic weights and retrieved by electrical activity.

Then each synapse corresponds to a channel usage

with information encoded in its weight. Maximum stor-

age capacity is achieved when synaptic weights are un-

correlated. The retrieval noise is manifested in fluctua-

tions of EPSP from trial to trial. For concreteness, we

assume here that the noise is Gaussian with a given var-

iance; we will argue at the end of this section that the

conclusions hold for other noise models.

Information storage capacity per synapse (measured

in nats rather than bits) is given by the expression de-

rived by Shannon (1948) for the AWGN channel:

Isynapse=1

2ln

1+

*

A2

A2

N

+!

(2.1)

where

*

A2

A2

N

+

is the average SNR among synapses. SNR for each syn-

apse is defined as the square of the mean EPSP ampli-

tude divided by the trial-to-trial variance of EPSP ampli-

tude.(IntheAWGNmodel,Acantakenegativeaswellas

positive values.) Using Equation 1.1, we can rewrite in-

formation storage capacity in terms of volume:

Isynapse=1

2ln

?

1+hVi

VN

?

(2.2)

where CVD is the average synapse volume.

As volume is a scarce resource, information storage

capacity is likely to be optimized on a per-volume basis

(see Introduction). For example, placing two or more

smaller synapses (connecting different pairs of neurons)

in the place of one larger synapse may increase memory

capacity. Then the total storage capacity of a unit vol-

ume of neural tissue is

Ivolume=

Isynapse

hVi+V0

=

1

2ðhVi+V0Þln

?

1+hVi

VN

?

(2.3)

where V0is the accessory volume necessary to support

a synapse. Accessory volume includes the volume of

wiring (axons and dendrites), glia, and perhaps extracel-

lular space. Information storage capacity as a function

of the size of the synapse, the relationship in Equation

2.3, is plotted in Figure 2A for different values of V0.

Optimal storage capacity is achieved at the maximum

of the Ivolume-versus-CVD/VNcurve in Figure 2A. The max-

imum can be found by setting the derivative of Equation

2.3tozeroasdescribedintheExperimentalProcedures.

Figure 2B shows the dependence ofinformation storage

capacity Ivolume(peak height in Figure 2A) and optimal

synaptic volume CVD (horizontal coordinate of the peak

in Figure 2A) on the accessory volume V0. As would be

expected, maximum information storage capacity per

unit volume is achieved when the accessory volume V0

is the smallest possible. In this regime, average synapse

volume CVD is much less than VN and—according to

Equation 1.1—synapses should therefore be noisy.

In reality, the accessory volume may not be infinitesi-

mal, as this would affect system functionality adversely.

For example, there is a hard limit on how thin axons can

be(Faisal et al., 2005). Also, reducing wiring volume may

increase conduction time delays and signal attenuation

(Chklovskii et al., 2002). In fact, delay and attenuation

are optimized when the wiring volume is of the same or-

der as the volume of synapses (Wen and Chklovskii,

2005), which happens when they are of the order of VN.

Then the optimal performance—in terms of jointly

Viewpoint

411

Page 4

maximizinginformationstorageandminimizingconduc-

tion time delay and attenuation—is achieved when aver-

agesynapsevolumeCVDiseitherlessthanoroftheorder

of VN. In either case, we arrive at the conclusion that typ-

ical synapses should be noisy, in agreement with exper-

imental observations.

The advantage of having greater numbers of smaller

synapses is valid not only for the AWGN model that

was considered above, but also for many reasonable

noise and cost models. For these other models, individ-

ual synapse channel capacity, Isynapse, is nondecreasing

and logarithmic in SNR. Thus, the inversely linear CVD

term that arises from the number of synapses in the

unit volume outpaces the logarithmic CVD term that

arises from individual synapse capacity, and so total ca-

pacity decreases with increasing CVD for large CVD.

An alternate way to see that the advantage of having

greater numbers of smaller synapses extends to other

reasonable noise models is through the concavity of

the capacity cost function of information theory. The ca-

pacity cost function generalizes channel capacity by im-

posing average cost constraints on the channel inputs.

Like channel capacity, it is the maximum rate at which

one can transmit information over a channel while still

achieving arbitrarily small probability of error; however,

now the optimization is constrained by cost. This

function is nondecreasing and concave downward

(McEliece, 1977; Shannon, 1959), which means that the

slope (capacity/cost) is larger at lower costs. If there

are no zero cost symbols, the capacity per unit cost is

maximized at the average cost for which a line con-

strained to pass through the origin has its point of tan-

gency to the capacity cost function. Such tangency

points correspond to the location of the peaks in Fig-

ure 2A. If there is a zero cost symbol, however, then

the optimum is for zero average cost (V0= 0 curve in Fig-

ure 2A). In many cases, it is difficult to find the optimum

capacity per unit cost analytically (Verdu ´, 1990). There

is, however, a numerical algorithm that can be used for

such a computation (Jimbo and Kunisawa, 1979). Simi-

lar mathematical arguments have been used in the con-

textofinformationtransmissiontoshowthathavingpar-

allel,lessreliablechannels—suchassynapses(Laughlin

et al., 1998) and ion channels (Schreiber et al., 2002)—

reduces metabolic costs.

In this section, we showed that—provided the acces-

sory volume needed to support a synapse is small—

numerous small and noisy synapses possess greater in-

formation storage capacity per unit volume than a few

large and reliable synapses. This result may help explain

why central synapses typically are unreliable (Allen and

Stevens, 1994; Hessler et al., 1993; Isope and Barbour,

2002; Mason et al., 1991; Raastad et al., 1992; Rose-

nmund et al., 1993; Sayer et al., 1990).

III. Optimal Distribution of Synaptic Weights

in the Discrete-States Model

Having established that synapses should be small and

noisy on average, we next examine how volume and

EPSPamplitudeshouldbedistributedamongsynapses.

In the AWGN model used in section II, the capacity-

achieving input distribution is also Gaussian (Shannon,

1948), and the synaptic volume is distributed exponen-

tially. If the noise amplitude ANis constant, synaptic

weight has a Gaussian distribution, as previously sug-

gested (Brunel et al., 2004). If, on the other hand, AN

scales as a power of A (Figure S1), the synaptic weight

distribution is a stretched (or compressed) exponential.

Here, exponential and Gaussian distributions are two

different, special cases.

However, it is not clear whether these predictions

from the AWGN model can be taken at face value. First,

theAWGNchannelmodelallowsbothnegativeandpos-

itive signals, whereas synaptic weight is positive for ex-

citatory synapses. Second, the Gaussiannoise assump-

tion isunlikely tohold, especially ifsynaptic weight must

be non-negative. Third, synaptic volume may not scale

as the synaptic weight SNR squared.

We therefore consider a different, discrete-states

model, where the cost function can be chosen arbitrarily

and the synaptic weight is non-negative, but which still

yields an exactly solvable optimization problem. The

Figure 2. Results of the AWGN Model

(A) Information storage capacity per volume, VN, of neural tissue as

a function of normalized average synapse volume. The relationship

between signal-to-noise ratio and volume for this plot uses acces-

sory volume, V0, values of 0, 1, and 10, normalized with respect to

VN. When V0= 0, the maximum storage capacity per unit volume oc-

curs when average synapse volume is infinitesimal. When V0> 0, the

finite maximum storage capacity per unit volume occurs at some

non-zero normalized synapse volume.

(B) Blue line: Maximum information storage capacity per volume VN

ofneuraltissue(verticalcoordinate ofthepeakin[A])asafunctionof

the accessory volume V0. Red line: the corresponding average syn-

apse volume CVD (horizontal coordinate of the peak in [A]).

Neuron

412

Page 5

reason anexact solution canbefound isthatthenoise is

treatedapproximately.Ratherthanconsideringacontin-

uous distribution of synaptic weights, we consider a set

of discrete synaptic states, with each state representing

the range of weights in the continuous distribution that

could be confused on retrieval due to noise. Then the

difference in synaptic weight between adjacent states

Ai and Ai + 1 is given by the two noise amplitudes,

AN(Ai) + AN(Ai + 1). From the information theoretic point

of view, each state is viewed as a symbol from an alpha-

bet characterized by a different cost (Figure 3A).

Such conversion of the noisy continuous-valued input

channel into a zero-error, discrete-valued input channel

is a convenient approximation (Kolmogorov and Tiho-

mirov,1959;Root,1968)becausethemutualinformation

of the noiseless channel reduces to the self-information

of the channel input distribution (or, equivalently, chan-

nel output distribution). By resorting to this approxima-

tion, we do not wish to imply that synaptic weights in

the brain necessarily vary in discrete steps. In section

VI, we will validate this approach by comparing its pre-

dictions to the predictions from a continuous channel

model (section IV).

Since the self-information is identical to entropy, the

maximization of information storage capacity per vol-

ume reduces to entropy maximization per volume,

a standard problem from statistical physics (see Exper-

imentalProcedures).Inneuroscience,suchamathemat-

ical problem has been solved in the context of neuronal

communication by the spike code (Balasubramanian

et al., 2001; de Polavieja, 2002, 2004). We consider

a set of synaptic states, i, characterized by the EPSP

amplitudes, Ai, and volume (or some other generalized

cost), Vi(Figure 3A). We search for the probability distri-

bution over synaptic states, pi, that maximizes informa-

tion storage capacity (measured in nats):

Isynapse= 2

X

i

pilnpi

(3.1)

per average volume of a synapse?V

?V =

X

i

piVi

(3.2)

Note that the average synaptic volume,?V, includes

the accessory volume, V0, which was excluded from

the definition of CVD used in the previous section.

We show in the Experimental Procedures that the

probability distribution over synaptic states, pi, that

maximizes information capacity per volume is given by

pi=expð2bViÞ

(3.3)

where b is defined by the condition

X

i

pi=1

and

Ivolume=Isynapse

?V

=

bP

i

i

Viexpð2bViÞ

Viexpð2bViÞ=b

P

(3.4)

Motivatedbyexperimental observations (Kopecetal.,

2006), we assume that synaptic state volume is distrib-

uted equidistantly, i.e., the volume of the ith synaptic

state is given by

Vi=V0+2iVN

(3.5)

Then the average volume per potential synapse (in-

cluding accessory volume, V0), defined as the total vol-

ume divided by the number of potential synapses (in-

cluding actual ones), can be expressed analytically

(see Experimental Procedures) as:

?V =V0+2VNexp

?

bðV022VNÞ

?

(3.6)

To allow comparison with empirical measurements

(sectionVI),wealsoderiveanexpressionfortheaverage

volume of actual synapses (see Experimental Proce-

dures), i.e., states with i > 0, and excluding accessory

volume, V0(Figure 3B):

hVii>0=2VNexpðbV0Þ

(3.7)

The optimal average volume of actual synapses in-

creases with the accessory volume. This result has an

intuitive explanation: once the big investment in wiring

(V0) has already been made, it is advantageous to use

bigger synapses that have higher SNR.

Figure 3. Discrete-States Model

(A) Synapses are modeled by a set of discrete noiseless synaptic

states characterized by mean EPSP amplitude Aiand volume Vi.

The difference in synaptic weight between adjacent states is

AN(Ai) + AN(Ai + 1).

(B) The average volume of actual synapses CVDi > 0(blue line) and the

fraction of synapses in i > 0 states, i.e., the filling fraction, f = (1 - p0)

(red line) as a function of accessory synapse volume normalized by

VN. Dashed line corresponds to the equipartition of volume into syn-

apses and wires. The solution satisfying competing requirements of

maximizing information storage capacity and minimizing conduc-

tion delays must be at f < 0.5.

Viewpoint

413

Page 6

The ratio between the number of actual synapses and

the number of potential synapses (including actual) is

called the filling fraction, f. In our model the filling frac-

tion is just the fraction of synapses in states i > 0 and

is given by (see Experimental Procedures):

f =expð22bVNÞ

(3.8)

and plotted in Figure 3B.

Information storage capacity per volume can be cal-

culatedusingEquations3.4and3.6.JustasintheAWGN

model, information storage capacity increases mono-

tonically with decreasing accessory volume. Unlike the

AWGN model, the growth of information storage capac-

ityisunlimited.Asinformationstoragecapacitydiverges

with decreasing accessory volume, V0, optimal informa-

tion storage is achieved when V0is as small as possible.

In this limit, the filling fraction, f, is much less than 1, as

illustrated in Figure 3B. This prediction is consistent

with empirical observations of sparse connectivity. In

addition, according to Figure 3B most actual synapses

have volume 2VN, and thus have SNR of order 1 (Equa-

tion 1.1). This prediction is in agreement with the exper-

imentally established noisiness of typical synapses.

Although local cortical circuits are sparse and typical

synapses are noisy, the filling fraction is not infinitesi-

mal. One explanation for this fact—which was dis-

cussed in the previous section—is that very small V0af-

fects system functionality adversely. The condition that

accessorywirevolumeisoftheorderofsynapsevolume

(Wen and Chklovskii, 2005) implies that CVD w V0/f. This

condition is illustrated in Figure 3B by a dashed line in-

tersecting the blue line. Then the competing desiderata

of maximizing information storage and minimizing con-

duction delays should yield a value of V0less than at

the intersection. The corresponding filling fraction is

less than half, but not infinitesimal.

By using Equations 3.3 and 1.1, we can find the prob-

abilityofsynapticstatesintermsoftheEPSPamplitude,

pi=expð2bðV0+VNðAi=ANÞaÞÞ

Such a distribution is called a stretched (or com-

pressed) exponential and is compared with experimen-

tal data in section VI. In the continuum limit, when the

probability changes smoothly between states, we can

convert Equation 3.9 to the probability density. Consid-

ering that there should be one synaptic state per two

noise amplitudes, 2AN, the probability density of the

EPSP distribution is given by

(3.9)

pðAÞ=

1

2ANðAÞexpð2bðV0+VNðA=ANðAÞÞaÞÞ

Interestingly, the explicit consideration of noise does

not alter the result, which follows from Equation 3.3,

that for V0/VN/0 optimum information storage is

achieved by using mostly the i = 0 state, with i = 1

used with exponentially low frequency. If V0= 0, this

type of problem can be solved exactly (Verdu ´, 1990,

2002), and the information storage capacity is maxi-

mized when, in addition to the zero cost symbol, only

one other symbol is chosen. The additional symbol is

chosen to maximize the Kullback-Leibler (KL) diver-

gence between conditional probabilities of that symbol

and of the zero cost symbol divided by the cost of the

(3.10)

additional symbol. If V0> 0, however, the problem of op-

timizing information storage capacity cannot be solved

analytically, prompting us to pursue a reverse approach

discussed in the next section.

IV. Calculation of the Synaptic Cost Function

from the Distribution of Synaptic Weights

The problem of directly and analytically finding the

capacity-achieving input distribution and the channel

capacity for a specified cost function is often rather

difficult and isonly known inclosed form for certain spe-

cial cases. In most cases, the channel capacity and

capacity-achieving input distribution are found using

numerical algorithms (Arimoto, 1972; Blahut, 1972). In

neuroscience, this algorithm was used in the context

of optimal information transmission by the spike

code (Balasubramanian et al., 2001; de Polavieja,

2002, 2004).

An alternative way to attack the optimization problem

is tospecify thechannel noise distribution and thechan-

nel input distribution and then determine the channel in-

putcostfunctionforwhichthesystemisoperatingatca-

pacity (Csisza ´r and Ko ¨rner, 1997; Gastpar et al., 2003).

This methodology does not seem to have been used for

neuroscience investigations, other than for a brief look

at sensory processing (Gastpar, 2003). Although this

method inverts the problem specification, it seems rea-

sonable if we are not sure of what the channel input cost

function is (e.g., we do not know what a in Equation 1.1

is).Theresultofthecomputationisacostfunction,which

may then be examined for relevance to the problem at

hand.

As before, we consider memory as a communication

channel (Figure 1). Information is stored inthe input vari-

able A, the retrieved (output) value of which is desig-

nated B. Gastpar et al. (2003) show that for a fixed chan-

nel input distribution p(A) and channel noise p(BjA), the

system is optimal—in the sense of operating at capacity

cost—if the cost function is of the form

VðAÞ=vDðpðBjAÞkpðBÞÞ+v0

(4.1)

where v > 0 and v0are arbitrary constants. Furthermore,

D($k$) denotes the KL divergence, which quantifies the

difference between the two probability distributions.

KL divergence is zero if and only if the two distributions

are identical. Note that the computed cost function is

optimal for any accessory volume cost.

First, we demonstrate that Equation 4.1 is valid in the

cases considered in previous sections. For the AWGN

model considered in section II, the input distribution

and the noise are Gaussian. Then the output distribution

is Gaussian as well. By substituting these distributions

into Equation 4.1, we can calculate the synaptic cost

function explicitly (see Experimental Procedures):

ð

w

B2exp

VðAÞw

pðBjAÞlnpðBjAÞ

?

pðBÞ

2ðB2AÞ2?

dB

ð

dBwA2

(4.2)

wherewimpliesequalityuptoaffinetransformation.We

find that the cost function is quadratic in synaptic

weight, as was initially assumed in section II, thus vali-

dating Equation 4.1 for this case.

Neuron

414

Page 7

Another example that validates Equation 4.1 is the

model of discrete noiseless synaptic states considered

in section III. In this case, p(B=AijAj) = dijand—using

Equation 4.1—we find that the cost function is given

by the logarithm of the input (or, alternatively, output)

distribution (see Experimental Procedures):

VðAjÞw

X

i

pðB=AijAjÞlnpðB=AijAjÞ

pðB=AiÞ

w2lnðpðAjÞÞ (4.3)

This is exactly what Equation 3.3 would predict, thus

providing another validation for Equation 4.1.

Next, we use Equation 4.1 to calculate the synaptic

cost function from experimentally measured distribu-

tions of synaptic weights and noise. We use the dataset

from Sjo ¨stro ¨m et al. (2001, 2003), also analyzed in Song

et al. (2005), where EPSPs were recorded in several con-

secutivetrialsforeachof637synapses. Tocarryoutthis

calculation, we rely on the assumption that information

stored at a synapse, A, can be identified by the mean

EPSP amplitude. Then, the conditional density, p(BjA),

is estimated for each synapse as the distribution of

EPSP amplitudes across trials (Figure 4A). The marginal

density, p(B), is the distribution of EPSP amplitude over

all trials and synapses. By substituting these distribu-

tions into Equation 4.1 we find estimates of the cost

function, V(A), for each synapse (Figure 4B). A power

law with exponent 0.48 provides a satisfactory fit. Error

bars are obtained from a bootstrapping procedure

(see Experimental Procedures).

V. Discrete Synapses May Provide Optimal

Information Storage

Themodelsofsynapticinformationstoragepresentedin

sections II and IV might have given the impression that

the optimal distribution of synaptic strength must be

continuous.Indeed,sectionIImodeledinformationstor-

age in synapses by the AWGN channel with average

power constraint, for which the optimal, capacity-

achieving distribution is the continuous Gaussian distri-

bution (Shannon, 1948). In addition, using the methods

of section IV, one can construct numerous cost-con-

strained channels with continuous capacity-achieving

distributions.

Here we suggest that discrete synaptic states may

achieve optimal or nearly optimal information storage.

First, we point out that, surprisingly, not all continuous

input channels have optimal input distributions that

are continuous. In particular, imposing a constraint on

the maximum weight (or volume) of a single synapse

maychangetheoptimal,capacity-achieving distribution

of synaptic weights from continuous form to a set of dis-

crete values. Such a maximum amplitude constraint is

quite natural from the biological point of view, because

neither volume nor EPSP can be infinitely large. Note

that, unlike in section III, where discreteness was an as-

sumption used to simplify mathematical analysis, here

thediscretesolutionemergesasaresultofoptimization.

For concreteness, we return to the AWGN channel

model considered in section II, but now we impose

a maximum weight constraint in addition to the average

volume constraint that was originally imposed. The

problem then reduces to the well-studied problem of

findingchannelcapacityforagivenaverageinputpower

and peak input power. For the AWGN channel, the

unique optimal input distribution consists of a finite set

of points. A proof of this fact is based on methods of

convex optimization and mathematical analysis (Smith,

1971).NotethattheBlahut-Arimotoalgorithmforcontin-

uous channels is based on sampling the input space

(Blahut, 1972) and cannot be used to determine whether

the optimal input distribution is continuous or discrete.

Consequently, an analytical proof is necessary.

Since it is known that the optimal input distribution

consists of a finite number of points, one can numeri-

cally search over this sequence of finite dimensional

spaces to find the locations and probabilities of these

points for particular average power and peak power

values. Moreover, there is a test procedure, based on

Figure 4. Synaptic Cost Function Calculated from EPSP Measure-

ments

(A) Typical distributions of EPSP amplitude among trials for synap-

ses characterized by different mean EPSP amplitudes.

(B)SynapticcostfunctionasafunctionofmeanEPSPamplitudecal-

culated from Equation 4.1 under assumption of optimal information

storage. Each data point represents a different synapse, with those

appearing in (A) highlighted in red. Horizontal error bars represent

the standard error for the mean EPSP amplitude; vertical error

bars represent the standard error for the KL divergence quantity in

Equation 4.1. The standard error was estimated by the bootstrap

procedure describedinExperimentalProcedures.Thepoints shown

with starred upper vertical error had infinite vertical error, as esti-

mated by the bootstrap procedure. The black line shows a least-

squares power law fit with exponent 0.48.

Viewpoint

415

Page 8

the necessity of satisfying the Karush-Kuhn-Tucker op-

timality conditions, to determine whether the obtained

numerical solution is in fact optimal. So one can apply

the numerical procedure to generate a possible solution

and unmistakably recognize whether this solution is op-

timal (Smith, 1971). Applying Smith’s optimization pro-

cedure, including both the search and the test for opti-

mality, yields the following result for the AWGN

channel. For noise power 1, symmetric peak amplitude

constraint [21.5, 1.5], and input power constraint

1.125 (an SNR close to 1), the optimal input distribution

consists of the zero point with large probability, and the

21.5 and 1.5 points with equal smaller probability

(Smith, 1971) (see Figure S2).

The conclusion that the distribution of synaptic

weights should be discrete valued holds not only for

the AWGN channel with hard limits imposed on synapse

size and weight, but also for other noise models. In par-

ticular, the discreteness result holds for a wide class of

additive noise channels under maximum amplitude con-

straint (Tchamkerten, 2004). Some fading channels that

have both additive and multiplicative noise and are sim-

ilarly constrained (Gursoy et al., 2005) also have this dis-

crete input property. Furthermore, channels other than

AWGN with constraints on both average power and

maximum amplitude have optimal input distributions

that consist of a finite number of discrete points (Huang

and Meyn, 2005).

A second observation is that—although there are

channels that have optimal input distributions that con-

sist of finite sets of discrete points—even channels that

have continuous optimal input distributions can be used

with discrete approximations of the optimal input distri-

bution and perform nearly at capacity. It is well known

that—in the average power constrained AWGN example

and in the limit of small SNR—the use of an alphabet

with only two symbols, 6CA2D1/2, does not significantly

reduce information storage capacity (Figure S3; also

see Experimental Procedures). In addition, Huang and

Meyn(2005)demonstratenumericallythatdiscreteinput

distributions, in some cases generated by sampling

the optimal continuous distribution, are only slightly

suboptimal.

VI. Theoretical Predictions and Experiment

In this section, we compare theoretical predictions with

known experimental data and suggest further experi-

mental tests of the theory.

In section II, we find that, by considering an AWGN

channel, information storage capacity increases as

afunctionofaccessoryvolumeV0/0.Whentheacces-

sory volume (V0) is less than the volume of a synapse

with unitary SNR (VN), storage is optimized by synapses

with average volume given by the geometric mean of V0

and VN. However, small accessory volume has a detri-

mental effect on the system functionality, because the

conduction time delay diverges as V0/ 0. As the mini-

mumconductiontimedelayisachievedwhenaccessory

volume is of the order of the synaptic volume (Chklovskii

et al., 2002; Wen and Chklovskii, 2005), the competition

between these requirements results in the optimal mean

synaptic volume being less than or equal to VN. This re-

sult is corroborated in the discrete-states model of sec-

tion III, where optimal synaptic volume was found to be

2VN. Although the noise is not explicitly represented in

that model, the synapse volume is the minimum possi-

ble. These two results predict that typical synapses

should be small and consequently noisy. Indeed, Isope

and Barbour (2002) report an SNR of w0.6 at parallel fi-

ber synapses onto cerebellar Purkinje cells. This finding

is in keeping with our prediction that the SNR should be

less than 1 but not infinitesimal. More generally, our pre-

dictionisalsoinagreementwithotherexperimentaldata

showing the noisiness of typical central synapses (Allen

and Stevens, 1994; Hessler et al., 1993; Isope and Bar-

bour, 2002; Mason et al., 1991; Raastad et al., 1992;

Rosenmund et al., 1993; Sayer et al., 1990).

In section III, we argue that optimal information stor-

age requires sparseness of synaptic connectivity, and

we predict a relationship between the filling fraction, f,

andtherelativevolumeoccupiedbysynapsesandwires

(Equations 3.7 and 3.8). To make a quantitative compar-

ison with empirical observations, we consider a mouse

cortical column. Potential synaptic connectivity in a cor-

tical column is all to all, meaning that axons and den-

drites of any two neurons pass sufficiently close to

each other that they can be connected through local

synaptogenesis (Chklovskii et al., 2004; Kalisman

et al., 2005; Le Be and Markram, 2006; Stepanyants

and Chklovskii, 2005). According to Stepanyants et al.

(2002), the fraction of potential synapses converted

into actual ones in mouse cortex is w0.3; we take this

fraction to be our filling fraction, f w 0.3. By using Equa-

tion 3.8, we find that 2bVN= 2ln(0.3) = 1.2, and by using

EquationEP.9,wefindthatbV0=0.36.Then,theaverage

volume per actual synapse is of the same order as the

accessory volume per actual synapse V0/f, in agreement

with experiments (Chklovskii et al., 2002). More detailed

calculation using Equation 3.7 shows that actual syn-

apse volume should be about 40% greater than acces-

sory volume per actual synapse. In reality, wire volume

is greater than synapse volume. This may be a conse-

quence of minimizing conduction delays as discussed

in section III. Hopefully, a future optimization framework

that combines conduction delays and information stor-

age capacity will account for this discrepancy.

Does this theory apply to the global brain network be-

yondthecorticalcolumn?Inprinciple,sparseness ofthe

global network seems consistent with the high cost of

wiring. However, a quantitative analysis is complicated

by the fact that—for a network that does not possess

potential all-to-all connectivity—the wiring cost de-

pends not just on the numbers of synapses but also on

which particular synapses are implemented. Therefore,

a detailed analysis of such a network would require

characterizing the cost and the information storage ca-

pacity of dendritic and axonal arbors quantitatively.

This is a difficult problem, because a theory of neuronal

arbors does not yet exist.

In section III, we predict that synaptic volume follows

an exponential distribution with the decay constant

b (Equation 3.3). This prediction can be tested experi-

mentally by measuring the volume of spine heads and

boutons in cortical neuropil. In comparing the distribu-

tion of volume, one should keep in mind that we are re-

ferring to the total volume of all synaptic contacts be-

tween two neurons (section I). In addition, if one

measures the filling fraction in the same neuropil, the

Neuron

416

Page 9

test involves no fitting parameters because b can be

calculated from the wiring volume and the filling frac-

tion; from Equation EP.9 and Equation 3.8, we get that

b = 2log(1 2 f)/V0. To overcome the difficulty in mea-

suring VNor V0, one can alternatively measure the ex-

perimentally accessible quantities f and CVDi > 0to deter-

mine b. Then, from Equations 3.7 and 3.8, b = 2log(f)/

(1 2 f)/CVDi > 0. However, these predictions are only

approximate, as the relative importance of maximizing

information storage and minimizing conduction delays

is unknown. In fact, these properties may vary de-

pending on animal species, brain region, and animal

age.

In section III, we predict the distribution of synaptic

weight for arbitrary values of a (Equation 3.9), which

can be compared to the experimentally observed syn-

aptic weight distribution obtained in neocortical layer

Vneurons(Songetal.,2005).Toperformsuchacompar-

ison, we sort synaptic weights into bins [Ai2 AN(Ai), Ai+

AN(Ai)] and plot a histogram (Figure S4). By performing

a least-squares fit of the logarithm of the EPSP distribu-

tion we find that the distribution is a stretched exponen-

tial with exponent 0.49. A least-squares fit of the stan-

dard deviation of EPSP amplitude as a function of

mean EPSP amplitude (Figure S1) yields a power law

with exponent 0.38. Hence, A/AN w A0.62, and from

Equation 3.9 we find that a = 0.49/0.62 = 0.79.

In section IV, we established a reverse link from the

distribution of synaptic weights and noise statistics

to the synaptic cost function. The best power-law fit

tothepoints in Figure4Byieldsasublinear costfunction

with exponent w0.48 (Figure 4B). Recalling that A/ANw

A0.62we find that a = 0.48/0.62 = 0.77. This estimate is

similar to that obtained using the discrete-states model,

thus validating the use of that model to approximate the

continuous distribution of synaptic weights (section III).

The prediction of a can be tested directly by measuring

the relationship between synaptic volume and weight.

Such an experiment would involve jointly measuring

the physical and electrophysiological properties of indi-

vidual synapses. Should the relationship between syn-

aptic weight and volume differ from that predicted in

section IV, other factors may contribute to the distribu-

tion of synaptic weights.

In section V, we argue that discrete synaptic states

could optimize information storage almost as well as—

and under some conditions better than—synapses

with continuous weights. This does not prove that syn-

apses with discrete states are strictly optimal; it merely

suggests that they could be. There is experimental evi-

dence that changes in the weights of individual synap-

ses are, in fact, discrete (Lisman, 2003; O’Connor

et al., 2005; Petersen et al., 1998), which seems consis-

tent with maximizing information storage. However, our

model assumes the samerelationship between synaptic

weight and volume for all synapses, which is only ap-

proximately correct. For example, synapses that are

more distant from the soma must be bigger to ensure

that somatic EPSP remains the same in the face of elec-

trotonic attenuation (Magee and Cook, 2000). Although

the optimal solution is not known in this case, we spec-

ulate that even if individual synapses were to have dis-

crete states, these states would not be the same among

all the synapses. In other words, the finding that individ-

ual synapses may change in discrete steps during plas-

ticity (Lisman, 2003; O’Connor et al., 2005; Petersen

et al., 1998) would not necessarily make the overall dis-

tribution of synaptic weights discrete.

Is there any evidence of discreteness in the distribu-

tion of synaptic weights? The distribution reported in

Song et al. (2005) is not monotonic; it has a maximum

at around 0.2 mV. Recalling that the full distribution of

synaptic states should include EPSPs of zero amplitude

(absent and silent synapses), there is a gap between the

zero-amplitude EPSP and the peak at around 0.2 mV

(Figure S1; also see Figure 5 in Song et al., 2005), which

hints that synaptic weight may be increased in discrete

steps and not continuously. Finally, we note that this

distribution was obtained combining data from hun-

dreds of animals (Song et al., 2005). Even if synaptic

states were discrete within one animal, such discrete-

ness would presumably be blurred by interanimal

variability.

Discussion

We have argued that maximizing information storage

capacity per volume yields typical synapses that are

small and hence noisy. This explains experimental ob-

servations of synaptic unreliability. From the same prin-

ciple, we derived the distribution of synaptic weights

and found it to be a stretched exponential, in agreement

with existing measurements. This argument also ex-

plains the sparseness of synaptic connectivity. We

also suggest that the discreteness of synaptic states is

consistent with maximization of information storage

capacity.

The strength of the information theory approach is

that it provides an upper bound on information storage

simply based on physical limitations, without explicitly

considering how memory is stored, retrieved, coded,

or decoded. For example, it is not known whether

error-correcting codes are used in the brain when in-

formation is stored across numerous synapses. Re-

gardless of whether error-correcting codes are used or

not, the capacity-achieving input distribution must be

used for optimal performance. Thus, our explanations

and predictions stand irrespective of whether or how

the brain uses error correction codes in information

storage.

Nevertheless, the independence of results obtained

using information theory approach on a specific imple-

mentation is also its weakness, because the impact of

unknown mechanisms is difficult to assess. Although in-

formation theory provides physical limits on information

storage capacity, there could be other constraints due

tomechanisms ofstorage and read-out, as well as oper-

ation requirements on the network. Neural network

models commonly assume specific mechanisms and

yield information storage capacity estimates different

from ours (Brunel et al., 2004; Gardner, 1987; McEliece

et al., 1987; Newman, 1988; Rolls and Treves, 1998). In-

terestingly, Brunel et al. (2004) predict a distribution of

synaptic weights similar to ours, although results such

as this one may depend on the details of the neural net-

work model at hand. Future research is needed to shed

more light on the biological mechanisms that shape and

constrain information storage and retrieval.

Viewpoint

417

Page 10

As our analysis relies on optimizing information stor-

age capacity, it is not applicable to brain regions for

which information storage is not the main task. For ex-

ample, synapses associated with early sensory pro-

cessing, e.g., in the retina (Laughlin et al., 1998; Sterling

and Matthews, 2005), or calyx of Held (von Gersdorff

and Borst, 2002), or those belonging to motorneurons

(Pierce and Mendell, 1993; Yeow and Peterson, 1991)

may be large and reliable. This would be consistent

with optimizing information transmission. In actuality,

any given brain circuit probably contributes to both

information storage and information transmission. In-

deed, by applying our analysis in reverse, one could

infer the role of a given circuit from its structural char-

acteristics. In particular, different cortical layers may be

optimized for a different combination of storage and

processing.

Our formulation of memory in the Shannon framework

implicitly casts each synapse—both potential and ac-

tual—as a channel usage. The total storage capacity is

thereforethenumberofsynapsesmultipliedbytheaver-

age synaptic storage capacity. This makes the storage

capacity on the order of the number of synapses, which

would correspond to an overall maximal storage capac-

ity of several kilobits for a neocortical L5 pyramidal neu-

ron (Braitenberg and Schu ¨z, 1998). It is possible, how-

ever, that the synaptic information retrieval mechanism

involves multiple read-out attempts from a single syn-

apse. Since each channel usage is separated in space

rather than in time, this does not increase the number

of channel usages. Regardless, one may wonder what

impact multiple read-out attempts would have on our

analysis of information storage capacity.

It is known that the SNR increases approximately as

the square root of the number of read-out trials for

most forms of signal integration (Harrington, 1955), so

if the information stored in each synapse was retrieved

usingthesamenumberofread-outattempts,thissimply

introduces a fixed multiplicative constant in Equation

1.1.Afixedconstant inEquation1.1cansimplybeincor-

porated into the VNterm, and all of our results stand.

Contrarily, if the number of read-out attempts is not

fixed,but varies across different synapses, then it would

castourestimateofnoiseintodoubt.Wepointout,how-

ever, that multiple read-out attempts would lead to large

time delays. Yet, ifinformation is used tocontrol dynam-

ical systems, it is known that large delay can be disas-

trous (Sahai and Mitter, 2006). In addition, it is not clear

how short-term plasticity caused by multiple read-out

attempts would be overcome.

Other possible concerns arise from the lack of a true

experimentally established input-output characteriza-

tion of synaptic memory. To address this concern would

require identification and description of the so-called

engram—the physical embodiment of memory—which

corresponds to the channel input, A (Figure 1). In addi-

tion, it would necessitate a better characterization of

the noise process that determines the input-output

probability distribution, p(BjA). Description of the alpha-

bet of A would furthermore settle the question, alluded

to in section V, of whether synapses are discrete valued

or continuously graded. In addition, we assumed in sec-

tionIVthatthechannelinputletterAisgivenbythearith-

metic mean of EPSPs observed in several trials. Alterna-

tives to this assumption may alter the horizontal

coordinate of points in Figure 4B.

Although our analysis relies on identifying synaptic

noise with retrieval—or more specifically the variability

of EPSP amplitude on ‘‘read-out’’—the noise may also

come from other sources. The main concern is perhaps

that long-term memory storage at a synapse is open to

perturbations due to active learning rules and ongoing

neuronal activity (Zhou and Poo, 2004), the so-called in

situnoise(Figure1).Thelongertheinformationisstored,

the greater the perturbations caused by such processes

(although see Abraham et al., 2002). Under generic as-

sumptions, Amit and Fusi (1994) demonstrated that

this noise restricts memory capacity significantly and

even paradoxically. Fusi et al. (2005) recently proposed

a solution to this paradox: the introduction of a cascade

of synaptic states with different transition probabilities

results in a form of metaplasticity that increases reten-

tion times in the face of ongoing activity. Presumably,

other forms of metaplasticity may also help protect

stored information from unwanted perturbations. In ad-

dition, the stability of physiological synaptic plasticity

appears to depend critically on the details of activity

patterns during and after the induction of plasticity

(Zhou and Poo, 2004), suggesting that specific biologi-

cal mechanisms for the protection of stored information

may exist.

Our theory can be modified to include sources of

noise other than retrieval. For example, if in situ noise

is quantified and turns out dominant, it can be used in

the calculations presented in sections II–IV. In fact, opti-

mality of noisy synapses (section II) may be relevant to

the resolution of the above paradox. In general, a better

understanding of the system functionality including

characterization of storage, in situ, and retrieval noise

should help specify p(BjA) in the future.

Finally,our contributions include not onlymanyexpla-

nations and predictions of physical structures, but also

the introduction of methods developed elsewhere to

the study of memory in the brain. For example, to de-

velop the optimization principles, we have applied infor-

mation theory to the study of physical neural memory

systems. Moreover, our application of an alternate for-

mulation of the capacity cost problem to study the

cost function (section IV) appears to be the first instance

wherethisalternatecharacterizationofchanneloptimal-

ity has been successfully applied to real system analy-

sis, whether biological or human engineered. This prob-

lem inversion has wide applicability to the experimental

study of information systems.

Experimental Procedures

Calculation of the Optimum Average Synapse Volume

Here we calculate analytically the optimum average synapse volume

CVD that maximizes information storage capacity per volume Ivolume

for given accessory volume V0and normalization VN. This problem

is mathematically identical to maximizing information transmission

along parallel pathways (Sarpeshkar, 1998). We take the derivative

of Equation 2.3 and set it to zero to obtain

VN

2vIvolume

vhVi

=

21

ðhVi+V0Þ2ln 1+hVi

!

+

1

hVi+V0

1

hVi+VN=0(EP.1)

This implies that the optimal CVD can be found by solving the

following equation:

Neuron

418

Page 11

hVi+V0

hVi+VN=ln

?

1+hVi

VN

?

(EP.2)

In the limiting cases, the optimizing average volume CVD and the

maximum storage capacity achieved are given by (Sarpeshkar,

1998):

i)

V0? VN0

(

hVi=

ffiffiffiffiffiffiffiffiffiffiffi

V0VN

p

maxfIvolumeg=

1

2VN

ii)

V0[VN0

(hVilnðhVi=VNÞ=V0

maxfIvolumeg=

1

2hVi

(EP.3)

The exactdependenceofsynaptic volume andthe storage capac-

ity on the accessory volume is shown in Figure 2B.

Derivation of Optimal Probability Distribution for Discrete

Zero-Error Synaptic States

Following Balasubramanian et al. (2001) and de Polavieja (2002,

2004), we first consider the problem of maximizing information stor-

age capacity

Isynapse= 2

X

i

piln pi

(EP.4)

in a given (average synaptic) volume

?V =

X

i

piVi

(EP.5)

Both thevolume constraintandthenormalizationcondition forthe

probabilities of synaptic weights can be included in the constrained

optimization by using Lagrange multipliers. Then we need to maxi-

mize

X

By setting the derivatives of Equation 3.3 with respect to piequal

to zero, we find that

Isynapse= 2

X

i

pilnpi2b

i

piVi2?V

!

2l

X

i

pi21

!

(EP.6)

pi=1

Zexpð2bViÞ

(EP.7)

where Z = Siexp(2bVi) is a normalization constant (called the parti-

tion function in statistical physics) and b is implicitly specified by the

condition

?V =1

Z

X

i

Viexpð2bViÞ

(EP.8)

Recall now that?V is not given, and our objective is to maximize in-

formation per unit cost. Such an optimization problem can be

solved,aspointedoutbyBalasubramanianetal.(2001),bychoosing

b such that the partition function Z = 1, i.e.,

X

In this case, the probability expression (Equation EP.7) simplifies

to

i

expð2bViÞ=1(EP.9)

pi=expð2bViÞ

(EP.10)

Substituting this expression into Equation EP.6, we find that infor-

mation storage capacity is given by

X

Combining this expression with Equation 3.2, we find that infor-

mation per volume is given by

Isynapse=b

i

Viexpð2bViÞ

(EP.11)

Ivolume=Isynapse

?V

=

bP

i

i

Viexpð2bViÞ

Viexpð2bViÞ=b

P

(EP.12)

which is Equation 3.4 of the main text.

Distribution over Discrete Synaptic States Equidistant

in Volume Space

Sums appearing in Equations EP.9 and EP.11 can be expressed in

a closed form if we assume that

Vi=V0+2iVN

(EP.13)

Then we can rewrite the normalization condition (Equation EP.9)

as

1=

X

?

N

i=0

expð2bViÞ=expð2bV0Þ+expð2bV0Þ

X

where we used an expression for the sum of the geometric series.

Multiplying both sides of this expression by the denominator, we

find

N

i=1

expð22ibVNÞ=expð2bV0Þ+expð2bðV0+2VNÞÞ

12expð22bVNÞ

(EP.14)

expð2bV0Þ+expð22bVNÞ=1:

(EP.15)

Average synapse volume (including accessory volume V0) is given

by

?V =

X

=2VNexpð22bVNÞ+V0expð2bV0Þ

expð2bV0Þ

N

i=0

Viexpð2bViÞ= 2v

vb

X

N

i=0

expð2bViÞ= 2v

vb

expð2bV0Þ

12expð22bVNÞ

=V0+2VNexpðbðV022VNÞÞ

(EP.16)

which is Equation 3.6 of the main text. Inthe limitingcases, these ex-

pressions reduce to:

i)

V0? VN0

?

?V =2bVNV0

bV0=expð22bVNÞ

ii)

V0[VN0

?

?V =V0

2bVN=expð2bV0Þ

(EP.17)

The volume of actual synapses (excluding accessory volume V0):

hVii>0=

X

N

i=1

2iVN

expð2bViÞ

12expð2bV0Þ

expð2bV0Þ

12expð2bV0Þ

= 2expð2bV0Þ

expð22bVNÞ

=expð2bV0Þ

expð22bVNÞ

which is Equation 3.7 of the main text.

Finally the filling fraction:

= 2

v

vb

X

expð22bVNÞ

12expð22bVNÞ

2VNexpð2bVNÞ

ðexpð2bVNÞ21Þ2=2VNexpðbV0Þ

N

i=1

expð22biVNÞ

v

vb

(EP.18)

f =

X

N

i=1

pi=expð2bðV0+2VNÞÞ

12expð22bVNÞ

=expð22bVNÞ

(EP.19)

which is Equation 3.8 of the main text.

Measurement of EPSPs in Synaptic Connections of L5

Pyramidal Neurons

Our cost function computation (section IV) and experimental plots

(Figures S1 and S4) are based on the dataset from Sjo ¨stro ¨m et al.

(2001, 2003), where detailed methods have been previously de-

scribed. This dataset was analyzed with respect to connectivity pat-

terns and synaptic weights in Song et al. (2005). Briefly, acute visual

Viewpoint

419

Page 12

cortical slices were cut from rats aged P12–P20; whole-cell record-

ing configuration was established on up to four thick-tufted neocor-

tical layer V pyramidal neurons using a gluconate-based internal

solution; connectivity was assessed using a minimum of ten traces;

EPSPs were measured using a 1ms window centered on the peak of

the averaged EPSP trace. The dataset consisted of recordings from

637 connected pairs of neurons. Between 11 and 150 responses

were recorded in each connected pair (repeated every 7–20 s to

avoid short-term depression); the vast majority of connections ad-

mitted between 40 and 65 responses. We define the synaptic weight

as the mean EPSP averaged across all responses.

Computing the Optimal Cost Function for Gaussian

Input-AWGN Channel and Zero Error Channel

For the AWGN channel, let Z be the random variable that represents

theindependent additivenoise. ThentheexpressioninEquation4.1,

up to affine transformation, reduces to

DðpBjAðbjaÞkpBðbÞÞ=DðpZðb2aÞkpBðbÞÞ

= 2hðpZðb2aÞÞ2

ð

pZðb2aÞlnpBðbÞdb

Since we are only interested in the quantity up to affine transfor-

mation, we need not calculate the first entropy term explicitly, since

it is constant, call it C1.

DðpBjAðbjaÞkpBðbÞÞ=C12

0

@ln

?

AN

2p

ð

1

p

AN

ffiffiffiffiffiffi

b2

2p

exp

1

Adb=C2+C3

2ðb2aÞ2

2A2

N

!

?

B

ð

1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

ffiffiffiffiffiffi

2p??A2i+A2

p

N

?

q

2

2??A2i+A2

!

N

?

C

exp

2ðb2aÞ2

2A2

N

b2db=C2+C3ðA2

N+a2Þ=C4a2+C5

whereC2,C3,C4,andC5areotherconstantsthatareformedbycom-

bining terms that need not be calculated explicitly. The result is

Equation 4.2 in the main text.

For the zero-error channel, the optimal cost function is

DðpðB=AijA=AjÞkpðB=AiÞÞ=

X

X

N

i=0

N

pðB=AijA=AjÞ lnpðB=AijA=AjÞ

pðB=AiÞ

=

i=0

dijln

dij

pðB=AiÞ

= 2lnpðA=AjÞ= 2ln pi

Calculation of the Synaptic Cost Function

In order to compute the optimal cost function according to Equation

4.1, we require the channel output distribution as well as the channel

outputdistributionconditionedontheinput.Toestimatethechannel

output cumulative distribution function, F(B), we simply use the em-

pirical cumulative distribution function, Femp(B). To account for the

variable number of EPSPs acquired from each synaptic connection,

the step size in the empirical cumulative distribution function con-

tributed by each data point is inversely proportional to the number

ofEPSPsobtainedfromthesynapseinquestion.Tomodeltheeffect

of absent synapses, we included in the empirical distribution func-

tion a steep Gaussian distribution function with mean at zero EPSP

amplitude and standard deviation of 0.1 mV (typical noise ampli-

tude). The area under this Gaussian distribution function is given by

oneminusthefillingfractionof11.6%(Songetal.,2005).Toestimate

the channel conditional density, p(BjA), we assume that all EPSPs

from a given synapse correspond to the same input letter; further-

more, we make a correspondence between the mean EPSP ampli-

tude and this input letter. For each synapse, we use a histogram

with ten uniformly spaced bins to estimate the conditional density,

pemp(BjA).Then the KLdivergence is approximated bythe following:

X

DðpempðBjAÞkpempðBÞÞ=

10

i=1

pempðB=bijAÞln

pempðB=bijAÞ

FempðB=riÞ2FempðB=liÞ

(EP.20)

where B = biimplies presence in the ith histogram bin, and the right

and left bin edges are denoted riand li, respectively. This KL diver-

gence is computed for each synapse, and the result is the optimal

cost function (Figure 4B).

We use the bootstrap method to determine confidence intervals.

The confidence interval represented by the horizontal error bars is

a small sample estimate of the standard error of the mean EPSP am-

plitude. The estimate is the sample standard deviation of the mean

EPSP amplitudes generated by sampling with replacement from

the empirical distribution that were measured for each synapse.

The result is based on 50 bootstrap trials. The confidence interval

represented by the vertical error bars is a small sample estimate of

the standard error of the KL divergence quantity in Equation

EP.20. In order to do the sampling with replacement for the boot-

strap procedure, randomness is introduced in two ways. First, 637

synapses are selected uniformly with replacement from the set of

637 measured synapses. These resampled synapses are used to

generate the empirical cumulative distribution function, Femp(B).

Second, for each of the measured synapses, EPSP values are se-

lected uniformly with replacement to produce the empirical condi-

tionalprobability massfunctionspemp(Bja)foreach ofthe637values

of a. Finally, Equation EP.20 is used to calculate a bootstrap version

of the data points in Figure 4B. This doubly stochastic resampling

procedure is repeated for 50 bootstrap trials, and the estimate is

the sample standard deviation of the KL divergence quantities for

each of the synapses. When we resample the synapses to generate

the Femp(B) for each of the bootstrap trials, there is a possibility that

no large EPSP amplitude synapse is selected. In such a case, the KL

divergence quantity will come out to be infinite since the output dis-

tribution will be zero where the conditional distribution will be non-

zero, so the two distributions will not be absolutely continuous

with respect to each other. More simply, Femp(B = rimV) will equal

Femp(B = limV), so the denominator inside the logarithm in Equation

EP.20 will be zero and cause the entire expression to be infinite. This

phenomenon of lack of distribution overlap resulting in an infinite

discontinuity in the KL divergence functional causes the two infinite

upper standard deviation errors represented by stars in Figure 4B.

For these two points, the lower standard deviation is estimated by

excluding points greater than the mean. For other points, the boot-

strap distributions are approximately symmetric, so the vertical er-

ror bars aresymmetrically replicated. The unweightedleast-squares

fit in the MATLAB curve-fitting toolbox is used to generate the fit to

a function of the form vAhwith h = 0.48, Figure 4B.

AWGN Channel Capacity at Low SNR

The information capacity of the AWGN channel with binary

input 6CA2D1/2is achieved by using the two inputs with equal proba-

bility. Define the variable SNR to be

*

A2

A2

N

+

Perturbing the input values with Gaussian noise yields the output

distribution

"

2p

"

2p

pBðbÞ=1

2

1ffiffiffiffiffiffi

p

p

exp

?2ðb2

?2ðb+

ffiffiffiffiffiffiffiffiffiffi

ffiffiffiffiffiffiffiffiffiffi

SNR

p

2

Þ2

?#

Þ2

+1

2

1ffiffiffiffiffiffi

exp

SNR

p

2

?#

where the system has been normalized so the channel output is nor-

malized by the noise amplitude: b / b/AN. The output distribution

then has entropy

hðBÞ= 2

ðN

2N

pBðbÞlnpBðbÞdb

which can be computed numerically.

The capacity is then given by

IðA;BÞ=hðBÞ2hðBjAÞ=hðBÞ21

2lnð2peÞ

where we have used the property that for additive noise channels,

the conditional entropy is the entropy of the noise process as well

as the known entropy of Gaussian random variables. The value of

the mutual information is close to the value from Equation 2.1 at

small SNR (Figure S3).

Neuron

420

Page 13

Supplemental Data

The Supplemental Data for this article can be found online at http://

www.neuron.org/cgi/content/full/52/3/409/DC1/.

Acknowledgments

We are grateful to M. DeWeese, A. Koulakov, C. Machens, R. Mali-

now, and S.K. Mitter for commenting on the early versions of the

manuscript and to A. Roth, Y. Mishchenko, M. Ha ¨usser, L. Sriniva-

san, and S.B. Nelson for discussions. We also thank the anonymous

reviewers and the editor for helping clarify our presentation. This

work was supported by the NIH Grant MH69838, the Klingenstein

Foundation Award, an NSF Graduate Research Fellowship, the

NSF Grant CCR-0325774, the Wellcome Trust, and an EU Marie

Curie grant.

Received: May 4, 2005

Revised: April 27, 2006

Accepted: October 10, 2006

Published: November 8, 2006

References

Abraham, W.C., Logan, B., Greenwood, J.M., and Dragunow, M.

(2002). Induction and experience-dependent consolidation of stable

long-term potentiation lasting months in the hippocampus. J. Neu-

rosci. 22, 9626–9634.

Allen, C., and Stevens, C.F. (1994). An evaluation of causes for unre-

liability of synaptic transmission. Proc. Natl. Acad. Sci. USA 91,

10380–10383.

Amit, D.J., and Fusi, S. (1994). Learning in neural networks with ma-

terial synapses. Neural Comput. 6, 957–982.

Arimoto, S. (1972). An algorithm for computing the capacity of arbi-

trary discrete memoryless channels. IEEE Trans. Inform. Theory IT-

18, 14–20.

Balasubramanian, V., Kimber, D., and Berry, M.J., II. (2001). Meta-

bolically efficient information processing. Neural Comput. 13, 799–

815.

Bekkers, J.M., and Stevens, C.F. (1995). Quantal analysis of EPSCs

recorded from small numbers of synapses in hippocampal cultures.

J. Neurophysiol. 73, 1145–1156.

Blahut, R.E. (1972). Computation of channel capacity and rate-

distortion functions. IEEE Trans. Inform. Theory IT-18, 460–473.

Braitenberg, V., and Schu ¨z, A. (1998). Cortex: Statistics and Geom-

etry of Neuronal Connectivity (Berlin, New York: Springer).

Brunel, N., Hakim, V., Isope, P., Nadal, J.-P., and Barbour, B. (2004).

Optimal information storage and the distribution of synaptic

weights: perceptron versus Purkinje cell. Neuron 43, 745–757.

Cash,S.,and Yuste, R.(1999). Linearsummationof excitatoryinputs

by CA1 pyramidal neurons. Neuron 22, 383–394.

Cherniak, C., Changizi, M., and Kang, D.W. (1999). Large-scale opti-

mization of neuron arbors. Phys. Rev. E Stat. Phys. Plasmas Fluids

Relat. Interdiscip. Topics 59, 6001–6009.

Chklovskii, D.B. (2004). Synaptic connectivity and neuronal mor-

phology: two sides of the same coin. Neuron 43, 609–617.

Chklovskii, D.B., Schikorski, T., and Stevens, C.F. (2002). Wiring op-

timization in cortical circuits. Neuron 34, 341–347.

Chklovskii,D.B.,Mel,B.W.,andSvoboda,K.(2004).Corticalrewiring

and information storage. Nature 431, 782–788.

Csisza ´r, I., and Ko ¨rner, J. (1997). Information Theory: Coding Theo-

rems for Discrete Memoryless Systems (Budapest: Akade ´miai

Kiado ´).

de Polavieja, G.G. (2002). Errors drive the evolution of biological sig-

nalling to costly codes. J. Theor. Biol. 214, 657–664.

dePolavieja,G.G. (2004). Reliablebiologicalcommunication with re-

alistic constraints. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. 70,

061910. 10.1103/PhysRevE.70.061910.

del Castillo, J., and Katz, B. (1954). Quantal components of the end-

plate potential. J. Physiol. (London) 124, 560–573.

Eldridge, D.F. (1963). A special application of information theory to

recording systems. IEEE Trans. Audio AU-11, 3–6.

Faisal, A.A., White, J.A., and Laughlin, S.B. (2005). Ion-channel noise

places limits on the miniaturization of the brain’s wiring. Curr. Biol.

15, 1143–1149.

Fusi, S., Drew, P.J., and Abbott, L.F. (2005). Cascade models of syn-

aptically stored memories. Neuron 45, 599–611.

Gardner, E. (1987). Maximum storage capacity in neural networks.

Europhys. Lett. 4, 481–485.

Gastpar, M. (2003). To code or not to code. PhD dissertation, E´cole

Polytechnique Fe ´de ´rale de Lausanne, Switzerland.

Gastpar, M., Rimoldi, B., and Vetterli, M. (2003). To code, or not to

code: lossy source-channel communication revisited. IEEE Trans.

Inform. Theory 49, 1147–1158.

Goldman, M.S. (2004). Enhancement of information transmission ef-

ficiency by synaptic failures. Neural Comput. 16, 1137–1162.

Gursoy, M.C., Poor, H.V., and Verdu ´, S. (2005). The noncoherent Ri-

cian fading channel—part I: structure of the capacity-achieving in-

put. IEEE Trans. Wireless Commun. 4, 2193–2206.

Harrington, J.V. (1955). An analysis of the detection of repeated sig-

nals in noise by binary integration. IRE Trans. Inform. Theory 1, 1–9.

Hebb, D.O. (1949). The Organization of Behavior: A Neuropsycho-

logical Theory (New York: Wiley).

Hessler, N.A., Shirke, A.M., and Malinow, R. (1993). The probability

of transmitter release at a mammalian central synapse. Nature

366, 569–572.

Holmgren, C., Harkany, T., Svennenfors, B., and Zilberter, Y. (2003).

Pyramidal cell communication within local networks in layer 2/3 of

rat neocortex. J. Physiol. 551, 139–153.

Hsu, A., Tsukamoto, Y., Smith, R.G., and Sterling, P. (1998). Func-

tional architecture of primate cone and rod axons. Vision Res. 38,

2539–2549.

Huang, J., and Meyn, S.P. (2005). Characterization and computation

of optimal distributions for channel coding. IEEE Trans. Inform. The-

ory 51, 2336–2351.

Immink, K.E.S., Siegel, P.H., and Wolf, J.K. (1998). Codes for digital

recorders. IEEE Trans. Inform. Theory 44, 2260–2299.

Isope,P.,andBarbour,B.(2002).Propertiesofunitarygranulecell/

Purkinje cell synapses in adult rat cerebellar slices. J. Neurosci. 22,

9668–9678.

Jimbo,M.,andKunisawa,K.(1979).Aniteration method forcalculat-

ing the relative capacity. Information and Control 43, 216–223.

Kalisman, N., Silberberg, G., and Markram, H. (2005). The neocorti-

cal microcircuit as a tabula rasa. Proc. Natl. Acad. Sci. USA 102,

880–885.

Kasai, H., Matsuzaki, M., Noguchi, J., Yasumatsu, N., and Nakahara,

H. (2003). Structure-stability-function relationships of dendritic

spines. Trends Neurosci. 26, 360–368.

Koch, C.(1999).Biophysics ofComputation:Information Processing

in Single Neurons (New York: Oxford University Press).

Koester, H.J., and Johnston, D. (2005). Target cell-dependent nor-

malization of transmitter release at neocortical synapses. Science

308, 863–866.

Kolmogorov, A., and Tihomirov, V. (1959). 3-entropy and 3-capacity

of sets in functional spaces. Uspekhi Matematicheskikh Nauk 14,

3–86.

Kopec, C.D., Li, B., Wei, W., Boehm, J., and Malinow, R. (2006). Glu-

tamate receptor exocytosis and spine enlargement during chemi-

cally induced long-term potentiation. J. Neurosci. 26, 2000–2009.

Laughlin, S.B., de Ruyter van Steveninck, R.R., and Anderson, J.C.

(1998). The metabolic cost of neural information. Nat Neurosci 1,

36–41.

Le Be, J.V., and Markram, H. (2006). Spontaneous and evoked syn-

aptic rewiring in the neonatal neocortex. Proc. Natl. Acad. Sci. USA

103, 13214–13219.

Levy, W.B., and Baxter, R.A. (1996). Energy efficient neural codes.

Neural Comput. 8, 531–543.

Viewpoint

421

Page 14

Levy, W.B., and Baxter, R.A. (2002). Energy-efficient neuronal com-

putation via quantal synaptic failures. J. Neurosci. 22, 4746–4755.

Lisman, J. (2003). Long-term potentiation: outstanding questions

and attempted synthesis. Proc. R Soc. Lond. B Biol. Sci. 358, 829–

842.

Lisman, J.E., and Harris, K.M. (1993). Quantal analysis and synaptic

anatomy–integrating two views of hippocampal plasticity. Trends

Neurosci. 16, 141–147.

Lynch, M.A. (2004). Long-term potentiation and memory. Physiol.

Rev. 84, 87–136.

Magee,J.C.,andCook,E.P.(2000).SomaticEPSPamplitudeisinde-

pendent of synapse location in hippocampal pyramidal neurons.

Nat. Neurosci. 3, 895–903.

Manwani, A., and Koch, C. (2000). Detecting and estimating signals

over noisy and unreliable synapses: information-theoretic analysis.

Neural Comput. 13, 1–33.

Markram, H. (1997). A network of tufted layer 5 pyramidal neurons.

Cereb. Cortex 7, 523–533.

Markram, H., Lu ¨bke, J., Frotscher, M., Roth, A., and Sakmann, B.

(1997). Physiology and anatomy of synaptic connections between

thick tufted pyramidal neurones in the developing rat neocortex.

J. Physiol. 500, 409–440.

Mason, A., Nicoll, A., and Stratford, K. (1991). Synaptic transmission

betweenindividualpyramidalneuronsoftheratvisualcortexinvitro.

J. Neurosci. 11, 72–84.

Matsuzaki, M., Ellis-Davies, G.C.R., Nemoto, T., Miyashita, Y., Iino,

M., and Kasai, H. (2001). Dendritic spine geometry is critical for

AMPA receptor expression in hippocampal CA1 pyramidal neurons.

Nat. Neurosci. 4, 1086–1092.

Matsuzaki, M., Honkura, N., Ellis-Davies, G.C., and Kasai, H. (2004).

Structural basis of long-term potentiation in single dendritic spines.

Nature 429, 761–766.

McEliece, R. (1977). The Theory of Information and Coding: A Math-

ematical Framework for Communication(London: Addison-Wesley).

McEliece, R.J., Posner, E.C., Rodemich, E.R., and Venkatesh, S.S.

(1987). The capacity of the Hopfield associative memory. IEEE

Trans. Inform. Theory IT-33, 461–482.

McGaugh, J.L. (2000). Memory—a century of consolidation. Science

287, 248–251.

Mitchison, G. (1991). Neuronal branching patterns and the economy

of cortical wiring. Proc. R. Soc. Lond. B Biol. Sci. 245, 151–158.

Morris, R.G. (2003). Long-term potentiation and memory. Proc. R.

Soc. Lond. B Biol. Sci. 358, 643–647.

Murthy, V.N., Schikorski, T., Stevens, C.F., and Zhu, Y. (2001). Inac-

tivity produces increases in neurotransmitter release and synapse

size. Neuron 32, 673–682.

Newman, C. (1988). Memory capacity in neural network models: rig-

orous lower bounds. Neural Netw. 1, 223–238.

Nusser,Z.,Lujan,R.,Laube,G.,Roberts,J.D.,Molnar,E.,andSomo-

gyi, P. (1998). Cell type and pathway dependence of synaptic AMPA

receptor number and variability in the hippocampus. Neuron 21,

545–559.

O’Connor, D.H.,Wittenberg,G.M.,andWang,S.S.-H.(2005).Graded

bidirectional synaptic plasticity is composed of switch-like unitary

events. Proc. Natl. Acad. Sci. USA 102, 9679–9684.

Petersen, C.C., Malenka, R.C., Nicoll, R.A., and Hopfield, J.J. (1998).

All-or-none potentiation at CA3-CA1 synapses. Proc. Natl. Acad.

Sci. USA 95, 4732–4737.

Pierce, J.P., and Mendell, L.M. (1993). Quantitative ultrastructure of

Ia boutons in the ventral horn: scaling and positional relationships.

J. Neurosci. 13, 4748–4763.

Poirazi, P., Brannon, T., and Mel, B.W. (2003). Arithmetic of sub-

threshold synaptic summation in a model CA1 pyramidal cell.

Neuron 37, 977–987.

Polsky, A.,Mel,B.W., andSchiller,J.(2004).Computationalsubunits

in thin dendrites of pyramidal cells. Nat. Neurosci. 7, 621–627.

Raastad, M., Storm, J.F., and Andersen, P. (1992). Putative single

quantum and single fibre excitatory postsynaptic currents show

similar amplitude range and variability in rat hippocampal slices.

Eur. J. Neurosci. 4, 113–117.

Ramo ´nyCajal,S.(1899).TexturadelSistemaNerviosodelHombrey

de los Vertebrados (New York: Springer).

Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W.

(1997). Spikes: Exploring the Neural Code (Cambridge, MA: The MIT

Press).

Rolls, E., and Treves, A. (1998). Neural Networks and Brain Function

(Cambridge: Oxford University Press).

Root, W.L. (1968). Estimates of 3 capacity for certain linear commu-

nication channels. IEEE Trans. Inform. Theory IT-14, 361–369.

Rosenmund, C., Clements, J.D., and Westbrook, G.L. (1993). Non-

uniform probability of glutamate release at a hippocampal synapse.

Science 262, 754–757.

Sahai, A., and Mitter, S.K. (2006). The necessity and sufficiency of

anytime capacity for stabilization of a linear system over a noisy

communication link Part I: scalar systems. IEEE Trans. Inform.

Theory. 52, 3369–3395.

Sarpeshkar, R. (1998). Analog versus digital: extrapolating from

electronics to neurobiology. Neural Comput. 10, 1601–1638.

Sayer, R.J., Friedlander, M.J., and Redman, S.J. (1990). The time

course and amplitude of EPSPs evoked at synapses between

pairs of CA3/CA1 neurons in the hippocampal slice. J. Neurosci.

10, 826–836.

Schikorski, T., and Stevens, C.F. (1997). Quantitative ultrastructural

analysis of hippocampal excitatory synapses. J. Neurosci. 17,5858–

5867.

Schreiber, S., Machens, C.K., Herz, A.V.M., and Laughlin, S.B.

(2002). Energy-efficient coding with discrete stochastic events.

Neural Comput. 14, 1323–1346.

Shannon, C.E. (1948).A mathematical theory of communication. Bell

Syst. Tech. J. 27, 379–423 and 623–656.

Shannon, C.E. (1959). Coding theorems for a discrete source with

a fidelity criterion. IRE National Convention Record 4, 142–163.

Silver, R.A., Lu ¨bke, J., Sakmann, B., and Feldmeyer, D. (2003). High-

probability uniquantal transmission at excitatory synapses in barrel

cortex. Science 302, 1981–1984.

Sjo ¨stro ¨m, P.J., Turrigiano, G.G., and Nelson, S.B. (2001). Rate, tim-

ing, and cooperativity jointly determine cortical synaptic plasticity.

Neuron 32, 1149–1164.

Sjo ¨stro ¨m,P.J.,Turrigiano, G.G.,andNelson,S.B.(2003).Neocortical

LTD viacoincident activation ofpresynapticNMDAand cannabinoid

receptors. Neuron 39, 641–654.

Smith, J.G. (1971). The information capacity of amplitude- and vari-

ance-constrained scalar Gaussian channels. Inf. Control 18, 203–

219.

Song, S., Sjo ¨stro ¨m, P.J., Reigl, M., Nelson, S., and Chklovskii, D.B.

(2005). Highly nonrandom features of synaptic connectivity in local

cortical circuits. PLoS Biol. 3, 0507–0519. 10.1371/journal.pbio.

0030068.

Stepanyants, A., Hof, P.R., and Chklovskii, D.B. (2002). Geometry

and structural plasticity of synaptic connectivity. Neuron 34, 275–

288.

Stepanyants, A., and Chklovskii, D.B. (2005). Neurogeometry and

potential synaptic connectivity. Trends Neurosci. 28, 387–394.

Sterling, P., and Matthews, G. (2005). Structure and function of

ribbon synapses. Trends Neurosci. 28, 20–29.

Streichert, L.C., and Sargent, P.B. (1989). Bouton ultrastructure and

synaptic growth in a frog autonomic ganglion. J. Comp. Neurol. 281,

159–168.

Takumi, Y., Ramı ´rez-Leo ´n, V., Laake, P., Rinvik, E., and Ottersen,

O.P. (1999). Different modes of expression of AMPA and NMDA re-

ceptors in hippocampal synapses. Nat. Neurosci. 2, 618–624.

Tanaka, J., Matsuzaki, M., Tarusawa, E., Momiyama, A., Molnar, E.,

Kasai, H., and Shigemoto, R. (2005). Number and density of AMPA

receptors in single synapses in immature cerebellum. J. Neurosci.

25, 799–807.

Tchamkerten, A. (2004). On the discreteness of capacity-achieving

distributions. IEEE Trans. Inform. Theory 50, 2773–2778.

Neuron

422

Page 15

Thomson,A.M.,andBannister,A.P.(2003).Interlaminarconnections

in the neocortex. Cereb. Cortex 13, 5–14.

Thomson, A.M., West, D.C., Wang, Y., and Bannister, A.P. (2002).

Synaptic connections and small circuits involving excitatory and in-

hibitory neurons in layers 2-5 of adult rat and cat neocortex: triple in-

tracellular recordings and biocytin labelling in vitro. Cereb. Cortex

12, 936–953.

Verdu ´, S. (1990). On channel capacity per unit cost. IEEE Trans. In-

form. Theory 36, 1019–1030.

Verdu ´, S. (2002). Spectral efficiency in the wideband regime. IEEE

Trans. Inform. Theory 48, 1319–1343.

von Gersdorff, H., and Borst, J.G. (2002). Short-term plasticity at the

calyx of held. Nat. Rev. Neurosci. 3, 53–64.

Wen, Q., and Chklovskii, D.B. (2005). Segregation of the brain into

gray and white matter: a design minimizing conduction delays.

PLoS Comput. Biol. 1, e78. 10.1371/journal.pcbi.0010078.

Yeow, M.B., and Peterson, E.H. (1991). Active zone organization and

vesicle content scale with bouton size at a vertebrate central syn-

apse. J. Comp. Neurol. 307, 475–486.

Zador, A. (1998). Impact of synaptic unreliability on the information

transmitted by spiking neurons. J. Neurophysiol. 79, 1219–1229.

Zhou, Q., and Poo, M.M. (2004). Reversal and consolidation of activ-

ity-induced synaptic modifications. Trends Neurosci. 27, 378–383.

Zhou,Q., Homma, K.J., and Poo, M.M. (2004). Shrinkage of dendritic

spines associated with long-term depression of hippocampal syn-

apses. Neuron 44, 749–757.

Viewpoint

423