Neuron 52, 409–423, November 9, 2006 ª2006 Elsevier Inc.DOI 10.1016/j.neuron.2006.10.017
Viewpoint Optimal Information Storage
in Noisy Synapses under
Lav R. Varshney,1,2Per Jesper Sjo ¨stro ¨m,3
and Dmitri B. Chklovskii2,*
1Department of Electrical Engineering and Computer
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
2Cold Spring Harbor Laboratory
Cold Spring Harbor, New York 11724
3Wolfson Institute for Biomedical Research and
Department of Physiology
University College London
London WC1E 6BT
Experimental investigations have revealed that synap-
ses possess interesting and, in some cases, unex-
pected properties. We propose a theoretical frame-
work that accounts for three of these properties:
typical central synapses are noisy, the distribution of
synaptic weights among central synapses is wide,
and synaptic connectivity between neurons is sparse.
We also comment on the possibility that synaptic
weights may vary in discrete steps. Our approach is
based on maximizing information storage capacity of
neural tissue under resource constraints. Based on
previous experimental and theoretical work, we use
volume as a limited resource and utilize the empirical
relationship between volume and synaptic weight. So-
lutions of our constrained optimization problems are
not only consistent with existing experimental mea-
surements but also make nontrivial predictions.
As synapses play central roles in the two principal tasks
of the brain, information processing and information
storage (Ramo ´n y Cajal, 1899), their properties have
been the subject of extensive experimentation. Out of
many important synaptic properties revealed over the
years, we focus on the following three. First, synaptic
connectivity is sparse not only in the brain in general,
but also in local circuits. In other words, the probability
of finding a synaptic connection between a randomly
chosen pair ofexcitatory neurons, even nearby neurons,
is much less than 1 (Holmgren et al., 2003; Isope and
Barbour, 2002; Markram, 1997; Markram et al., 1997;
Mason et al., 1991; Song et al., 2005; Thomson and
Bannister, 2003; Thomson et al., 2002). Second, typical
central synapses are noisy devices. Due, for example,
to probabilistic transmitter release, firing of the presyn-
aptic neuron occasionally fails to evoke an excitatory
postsynaptic potential (EPSP). Moreover, the amplitude
of the EPSP varies from trial to trial (Allen and Stevens,
1994; Hessler et al., 1993; Isope and Barbour, 2002;
Mason et al., 1991; Raastad et al., 1992; Rosenmund
et al., 1993; Sayer et al., 1990). Third, while the majority
of synaptic weights are relatively weak (mean EPSP <
1 mV), the weight distribution is broad with a notable
and Barbour, 2002; Markram et al., 1997; Mason et al.,
1991; Sayer et al., 1990; Sjo ¨stro ¨m et al., 2001).
Although a unified theoretical framework capable of
accounting for all of these properties does not exist,
several of these properties have been addressed previ-
ously. The noisiness of typical central synapses seemed
particularly puzzling since synapses act as conduits of
information between neurons (Koch, 1999). Several the-
oretical studies have considered the impact of synaptic
noise on information transmission through a synapse,
generally in the context of sensory processing (Gold-
man, 2004; Levy and Baxter, 2002; Manwani and Koch,
2000; Zador, 1998). These papers have shown that, un-
der some conditions or under some constraints, synap-
tic noisiness facilitates the efficiency of information
transmission. Moreover, Laughlin et al. (1998) have
pointed out that splitting information and transmitting
it over several less reliable but metabolically cheaper
channels reduces energy requirements. Adding infor-
mation channels invokes costs associated with building
and maintaining those channels (Levy and Baxter, 1996;
Schreiber et al., 2002), which must also be taken into ac-
count (Sarpeshkar, 1998). In a separate line of inquiry,
Brunel et al. (2004) explain the sparseness of synaptic
connectivity and the distribution of synaptic weights in
the cerebellum by maximizing the storage capacity of
a perceptron network.
apses by developing a theoretical framework based on
the role of synapses as mechanisms of information stor-
age, rather than their dual role in transmitting informa-
tion between neurons. It is widely believed that long-
alteration in the strength of existing synapses (Hebb,
1949; McGaugh, 2000), through long-term potentiation
(LTP) and long-term depression (LTD) (Lynch, 2004;
Morris, 2003). Memories are retrieved by electrical activ-
ity of neurons that ‘‘reads out’’ the pattern of synaptic
connectivity between them. Thus a synaptic memory
the present to the future (Figure 1). Although information
storage is well recognized as a case of a general com-
munication system (Csisza ´r and Ko ¨rner, 1997; Eldridge,
1963; Immink et al., 1998) and information theory has
been successfully applied in neuroscience (Rieke et al.,
1997), the application of information theory to the analy-
sis of synapses as memory elements has received little
Our theoretical analysis is based on maximizing infor-
mation storage capacity of synapses under resource
constraints. Generally, information storage capacity of
a system depends on the signal-to-noise ratio (SNR);
in the case of synapses, this is a ratio between average
synaptic weight and average noise. It would seem that
the best strategy for increased information storage
capacity would be to increase the synapse SNR; how-
ever, this increase in SNR comes at a cost. For given
noise, increasing the SNR requires increasing the aver-
age synaptic weight. But the weight of an individual syn-
aptic contact is positively correlated with its volume
(comprising the combined volume of the spine head
and axonal bouton) (Kasai et al., 2003; Lisman and Har-
ris, 1993; Matsuzaki et al., 2001; Murthy et al., 2001;
Nusser et al., 1998; Schikorski and Stevens, 1997; Ta-
kumi et al., 1999; Tanaka et al., 2005). As the weight of
a synaptic connection is composed of the weights of in-
dividual synaptic contacts and its volume is the sum of
contact volume, the correlation between the weight
and the volume should hold for the synaptic connection
as a whole. The volume, however, is a costly resource
(Cherniak et al., 1999; Chklovskii, 2004; Hsu et al.,
mation storage capacity should be maximized under
Here, we cast the problem of long-term memory into
the framework of information theory and deduce struc-
tural and connectivity properties of synapses that lead
as the maximization of the brain’s information storage
capacity under constrained cost, quantified by synaptic
volume. Note that we consider memory storage from
a physical perspective, looking at the information stor-
age density of neural tissue. Other than presupposing
that volume is a constrained resource, our approach re-
self, such as a particular network organization or certain
activity patterns. Previous work, however, examined the
memory storage capacity of particular neural network
models (Brunel et al., 2004; Gardner, 1987; McEliece
et al., 1987; Newman, 1988; Rolls and Treves, 1998).
work designs and properties, thereby providing results
that are in part due to the a priori assumptions.
The paper is organized as follows. In section I, we
specify our model and formalize the empirical relation-
ship between synaptic weight and synaptic volume.
This preliminary step allows a quantitative tradeoff be-
tween storage capacity and cost. In section II, we show
that synaptic connections should, on average, have
smallvolumeand consequently benoisy tomaximizein-
formation storage capacity per unit volume. In section
III, we determine the distribution of synaptic weights—
not just the average structural property of synapses—
that maximizes capacity for a particular synapse model.
This optimal distribution includes many zero-weight
connections, or potential synapses, which is in accor-
dance with experimental observations of sparse synap-
tic connectivity. In section IV, we use an experimentally
determined distribution of synaptic weights, as well as
synaptic noise, to compute the cost function for which
the information storage system operates optimally.
the synaptic weight cost relationship specified in sec-
may perform almost optimally or, in a slightly different
yet naturally constrained model, better than continuous
valued synaptic states. Finally, section VI compares our
theoretical predictions with known experimental data
and suggests further tests of the theory.
I. Relationship between Synaptic Weight
We start by formulating the model of a synapse in the in-
formation storage context, which is based on existing
experimental observations. Although synaptically con-
nected pairs of cortical neurons usually share multiple
synaptic contacts (Kalisman et al., 2005; Koester and
Johnston, 2005; Markram et al., 1997; Silver et al.,
contacts collectively as a synapse. Such a definition
is motivated by electrophysiological measurements,
which record synaptic weight of all the contacts
We assume that information is stored in the synaptic
weight, A, of each synapse. The weight can be obtained
by averaging EPSP amplitude measured in multiple tri-
als in response to the firing of a presynaptic neuron.
Then the standard deviation of the EPSP amplitude
from trial to trial in a given synapse is the noise ampli-
tude, AN. As one might expect from the Poisson model
of synaptic release (Bekkers and Stevens, 1995; del
Castillo and Katz, 1954), the noise amplitude increases
sublinearly with the synaptic weight. Recent measure-
ments suggest a power law with the exponent about
0.38 (Markram et al., 1997; Song et al., 2005) (see
Figure S1 in the Supplemental Data available online).
The volume of a synapse is composed of individual
synaptic contact’s volume, which, in turn, correlates
with the contribution of each synaptic contact to synap-
correlates with many ultrastructural characteristics,
such as the number and area of active zones, number
Figure 1. Information Theoretic Models of Communication and Memory
(Left) Shannon’s schematic diagram of a general communication system (Shannon, 1948). Here, incoming information is denoted A, whereas the
information transmitted through and distorted by the channel is labeled B. (Right) Schematic diagram of a memory system cast as a communi-
is indicated by B. The various sources of noise have been explicitly notated. Storage noise refers to noise that arises in the storage process, in
situ noise refers to noise that perturbs the information while it is stored, and retrieval noise refers to noise in the retrieval process.
of vesicles, area of the postsynaptic density, and the
number of receptors (Lisman and Harris, 1993; Murthy
et al., 2001; Nusser et al., 1998; Pierce and Mendell,
1993; Schikorski and Stevens, 1997; Streichert and Sar-
and Peterson, 1991).Physiologically, individualsynaptic
contact’s volume correlates with the synaptic weight
(Kasai et al., 2003; Matsuzaki et al., 2001). In fact, an in-
pany LTP (Matsuzaki et al., 2004; Kopec et al., 2006),
whereas LTD may result in the converse volume de-
crease (Zhou et al., 2004).
Because contributions of individual contacts to syn-
aptic weight may add up linearly (Cash and Yuste,
1999), the volume of a synapse correlates with the syn-
aptic weight. Indeed, a neuron can be viewed as a single
computational unit (Chklovskii et al., 2004), as there is
evidence that multiple synaptic contacts within a con-
nected pair ofneurons have correlated release probabil-
ity (Koester and Johnston, 2005) and that the total syn-
aptic connection weight correlates with the number of
synaptic contacts (Kalisman et al., 2005). Alternatively,
the integrative compartment may be smaller—such as
a single dendritic branch (Poirazi et al., 2003; Polsky
et al., 2004)—and individual synaptic contacts may in
addition vary their weights independently. In this case,
our model would have to be modified.
As the noise amplitude, AN, is related by a power law
to the mean EPSP amplitude, A, which is strongly corre-
lated with the synapse volume, V, we can formulate the
following scaling relationship:
where VNis the volume of a synapse with an SNR of 1.
Although existing experimental measurements (Kasai
et al., 2003; Matsuzaki et al., 2001; Murthy et al., 2001;
Schikorski and Stevens, 1997; Song et al., 2005; Takumi
et al., 1999; Tanaka et al., 2005) support Equation 1.1,
they are not sufficient to establish the value of the
II. Noisy Synapses Maximize Information Storage
In this section, we deduce optimal average synaptic
weight and volume by maximizing information storage
capacity per unit volume. We invoke the synaptic
weight/volume relationship formulated in the previous
section (Equation1.1) witha =2;other cases will becon-
sidered in later sections. For a = 2, the problem of max-
imizing information storage capacity in a given volume
reduces to the well-studied problem of maximizing
channel capacity for a given input power. When the
channel contributes additive white Gaussian noise
(AWGN), such a problem is exactly solvable.
Inour context, information isstored inthe alteration of
synaptic weights and retrieved by electrical activity.
Then each synapse corresponds to a channel usage
with information encoded in its weight. Maximum stor-
age capacity is achieved when synaptic weights are un-
correlated. The retrieval noise is manifested in fluctua-
tions of EPSP from trial to trial. For concreteness, we
assume here that the noise is Gaussian with a given var-
iance; we will argue at the end of this section that the
conclusions hold for other noise models.
Information storage capacity per synapse (measured
in nats rather than bits) is given by the expression de-
rived by Shannon (1948) for the AWGN channel:
is the average SNR among synapses. SNR for each syn-
apse is defined as the square of the mean EPSP ampli-
tude divided by the trial-to-trial variance of EPSP ampli-
positive values.) Using Equation 1.1, we can rewrite in-
formation storage capacity in terms of volume:
where CVD is the average synapse volume.
As volume is a scarce resource, information storage
capacity is likely to be optimized on a per-volume basis
(see Introduction). For example, placing two or more
smaller synapses (connecting different pairs of neurons)
in the place of one larger synapse may increase memory
capacity. Then the total storage capacity of a unit vol-
ume of neural tissue is
where V0is the accessory volume necessary to support
a synapse. Accessory volume includes the volume of
wiring (axons and dendrites), glia, and perhaps extracel-
lular space. Information storage capacity as a function
of the size of the synapse, the relationship in Equation
2.3, is plotted in Figure 2A for different values of V0.
Optimal storage capacity is achieved at the maximum
of the Ivolume-versus-CVD/VNcurve in Figure 2A. The max-
imum can be found by setting the derivative of Equation
Figure 2B shows the dependence ofinformation storage
capacity Ivolume(peak height in Figure 2A) and optimal
synaptic volume CVD (horizontal coordinate of the peak
in Figure 2A) on the accessory volume V0. As would be
expected, maximum information storage capacity per
unit volume is achieved when the accessory volume V0
is the smallest possible. In this regime, average synapse
volume CVD is much less than VN and—according to
Equation 1.1—synapses should therefore be noisy.
In reality, the accessory volume may not be infinitesi-
mal, as this would affect system functionality adversely.
For example, there is a hard limit on how thin axons can
be(Faisal et al., 2005). Also, reducing wiring volume may
increase conduction time delays and signal attenuation
(Chklovskii et al., 2002). In fact, delay and attenuation
are optimized when the wiring volume is of the same or-
der as the volume of synapses (Wen and Chklovskii,
2005), which happens when they are of the order of VN.
Then the optimal performance—in terms of jointly
tion time delay and attenuation—is achieved when aver-
of VN. In either case, we arrive at the conclusion that typ-
ical synapses should be noisy, in agreement with exper-
The advantage of having greater numbers of smaller
synapses is valid not only for the AWGN model that
was considered above, but also for many reasonable
noise and cost models. For these other models, individ-
ual synapse channel capacity, Isynapse, is nondecreasing
and logarithmic in SNR. Thus, the inversely linear CVD
term that arises from the number of synapses in the
unit volume outpaces the logarithmic CVD term that
arises from individual synapse capacity, and so total ca-
pacity decreases with increasing CVD for large CVD.
An alternate way to see that the advantage of having
greater numbers of smaller synapses extends to other
reasonable noise models is through the concavity of
the capacity cost function of information theory. The ca-
pacity cost function generalizes channel capacity by im-
posing average cost constraints on the channel inputs.
Like channel capacity, it is the maximum rate at which
one can transmit information over a channel while still
achieving arbitrarily small probability of error; however,
now the optimization is constrained by cost. This
function is nondecreasing and concave downward
(McEliece, 1977; Shannon, 1959), which means that the
slope (capacity/cost) is larger at lower costs. If there
are no zero cost symbols, the capacity per unit cost is
maximized at the average cost for which a line con-
strained to pass through the origin has its point of tan-
gency to the capacity cost function. Such tangency
points correspond to the location of the peaks in Fig-
ure 2A. If there is a zero cost symbol, however, then
the optimum is for zero average cost (V0= 0 curve in Fig-
ure 2A). In many cases, it is difficult to find the optimum
capacity per unit cost analytically (Verdu ´, 1990). There
is, however, a numerical algorithm that can be used for
such a computation (Jimbo and Kunisawa, 1979). Simi-
lar mathematical arguments have been used in the con-
et al., 1998) and ion channels (Schreiber et al., 2002)—
reduces metabolic costs.
In this section, we showed that—provided the acces-
sory volume needed to support a synapse is small—
numerous small and noisy synapses possess greater in-
formation storage capacity per unit volume than a few
large and reliable synapses. This result may help explain
why central synapses typically are unreliable (Allen and
Stevens, 1994; Hessler et al., 1993; Isope and Barbour,
2002; Mason et al., 1991; Raastad et al., 1992; Rose-
nmund et al., 1993; Sayer et al., 1990).
III. Optimal Distribution of Synaptic Weights
in the Discrete-States Model
Having established that synapses should be small and
noisy on average, we next examine how volume and
In the AWGN model used in section II, the capacity-
achieving input distribution is also Gaussian (Shannon,
1948), and the synaptic volume is distributed exponen-
tially. If the noise amplitude ANis constant, synaptic
weight has a Gaussian distribution, as previously sug-
gested (Brunel et al., 2004). If, on the other hand, AN
scales as a power of A (Figure S1), the synaptic weight
distribution is a stretched (or compressed) exponential.
Here, exponential and Gaussian distributions are two
different, special cases.
However, it is not clear whether these predictions
from the AWGN model can be taken at face value. First,
itive signals, whereas synaptic weight is positive for ex-
citatory synapses. Second, the Gaussiannoise assump-
tion isunlikely tohold, especially ifsynaptic weight must
be non-negative. Third, synaptic volume may not scale
as the synaptic weight SNR squared.
We therefore consider a different, discrete-states
model, where the cost function can be chosen arbitrarily
and the synaptic weight is non-negative, but which still
yields an exactly solvable optimization problem. The
Figure 2. Results of the AWGN Model
(A) Information storage capacity per volume, VN, of neural tissue as
a function of normalized average synapse volume. The relationship
between signal-to-noise ratio and volume for this plot uses acces-
sory volume, V0, values of 0, 1, and 10, normalized with respect to
VN. When V0= 0, the maximum storage capacity per unit volume oc-
curs when average synapse volume is infinitesimal. When V0> 0, the
finite maximum storage capacity per unit volume occurs at some
non-zero normalized synapse volume.
(B) Blue line: Maximum information storage capacity per volume VN
the accessory volume V0. Red line: the corresponding average syn-
apse volume CVD (horizontal coordinate of the peak in [A]).
reason anexact solution canbefound isthatthenoise is
uous distribution of synaptic weights, we consider a set
of discrete synaptic states, with each state representing
the range of weights in the continuous distribution that
could be confused on retrieval due to noise. Then the
difference in synaptic weight between adjacent states
Ai and Ai + 1 is given by the two noise amplitudes,
AN(Ai) + AN(Ai + 1). From the information theoretic point
of view, each state is viewed as a symbol from an alpha-
bet characterized by a different cost (Figure 3A).
Such conversion of the noisy continuous-valued input
channel into a zero-error, discrete-valued input channel
is a convenient approximation (Kolmogorov and Tiho-
of the noiseless channel reduces to the self-information
of the channel input distribution (or, equivalently, chan-
nel output distribution). By resorting to this approxima-
tion, we do not wish to imply that synaptic weights in
the brain necessarily vary in discrete steps. In section
VI, we will validate this approach by comparing its pre-
dictions to the predictions from a continuous channel
model (section IV).
Since the self-information is identical to entropy, the
maximization of information storage capacity per vol-
ume reduces to entropy maximization per volume,
a standard problem from statistical physics (see Exper-
ical problem has been solved in the context of neuronal
communication by the spike code (Balasubramanian
et al., 2001; de Polavieja, 2002, 2004). We consider
a set of synaptic states, i, characterized by the EPSP
amplitudes, Ai, and volume (or some other generalized
cost), Vi(Figure 3A). We search for the probability distri-
bution over synaptic states, pi, that maximizes informa-
tion storage capacity (measured in nats):
per average volume of a synapse?V
Note that the average synaptic volume,?V, includes
the accessory volume, V0, which was excluded from
the definition of CVD used in the previous section.
We show in the Experimental Procedures that the
probability distribution over synaptic states, pi, that
maximizes information capacity per volume is given by
where b is defined by the condition
Motivatedbyexperimental observations (Kopecetal.,
2006), we assume that synaptic state volume is distrib-
uted equidistantly, i.e., the volume of the ith synaptic
state is given by
Then the average volume per potential synapse (in-
cluding accessory volume, V0), defined as the total vol-
ume divided by the number of potential synapses (in-
cluding actual ones), can be expressed analytically
(see Experimental Procedures) as:
To allow comparison with empirical measurements
volume of actual synapses (see Experimental Proce-
dures), i.e., states with i > 0, and excluding accessory
volume, V0(Figure 3B):
The optimal average volume of actual synapses in-
creases with the accessory volume. This result has an
intuitive explanation: once the big investment in wiring
(V0) has already been made, it is advantageous to use
bigger synapses that have higher SNR.
Figure 3. Discrete-States Model
(A) Synapses are modeled by a set of discrete noiseless synaptic
states characterized by mean EPSP amplitude Aiand volume Vi.
The difference in synaptic weight between adjacent states is
AN(Ai) + AN(Ai + 1).
(B) The average volume of actual synapses CVDi > 0(blue line) and the
fraction of synapses in i > 0 states, i.e., the filling fraction, f = (1 - p0)
(red line) as a function of accessory synapse volume normalized by
VN. Dashed line corresponds to the equipartition of volume into syn-
apses and wires. The solution satisfying competing requirements of
maximizing information storage capacity and minimizing conduc-
tion delays must be at f < 0.5.
The ratio between the number of actual synapses and
the number of potential synapses (including actual) is
called the filling fraction, f. In our model the filling frac-
tion is just the fraction of synapses in states i > 0 and
is given by (see Experimental Procedures):
and plotted in Figure 3B.
Information storage capacity per volume can be cal-
model, information storage capacity increases mono-
tonically with decreasing accessory volume. Unlike the
AWGN model, the growth of information storage capac-
with decreasing accessory volume, V0, optimal informa-
tion storage is achieved when V0is as small as possible.
In this limit, the filling fraction, f, is much less than 1, as
illustrated in Figure 3B. This prediction is consistent
with empirical observations of sparse connectivity. In
addition, according to Figure 3B most actual synapses
have volume 2VN, and thus have SNR of order 1 (Equa-
tion 1.1). This prediction is in agreement with the exper-
imentally established noisiness of typical synapses.
Although local cortical circuits are sparse and typical
synapses are noisy, the filling fraction is not infinitesi-
mal. One explanation for this fact—which was dis-
cussed in the previous section—is that very small V0af-
fects system functionality adversely. The condition that
(Wen and Chklovskii, 2005) implies that CVD w V0/f. This
condition is illustrated in Figure 3B by a dashed line in-
tersecting the blue line. Then the competing desiderata
of maximizing information storage and minimizing con-
duction delays should yield a value of V0less than at
the intersection. The corresponding filling fraction is
less than half, but not infinitesimal.
By using Equations 3.3 and 1.1, we can find the prob-
Such a distribution is called a stretched (or com-
pressed) exponential and is compared with experimen-
tal data in section VI. In the continuum limit, when the
probability changes smoothly between states, we can
convert Equation 3.9 to the probability density. Consid-
ering that there should be one synaptic state per two
noise amplitudes, 2AN, the probability density of the
EPSP distribution is given by
Interestingly, the explicit consideration of noise does
not alter the result, which follows from Equation 3.3,
that for V0/VN/0 optimum information storage is
achieved by using mostly the i = 0 state, with i = 1
used with exponentially low frequency. If V0= 0, this
type of problem can be solved exactly (Verdu ´, 1990,
2002), and the information storage capacity is maxi-
mized when, in addition to the zero cost symbol, only
one other symbol is chosen. The additional symbol is
chosen to maximize the Kullback-Leibler (KL) diver-
gence between conditional probabilities of that symbol
and of the zero cost symbol divided by the cost of the
additional symbol. If V0> 0, however, the problem of op-
timizing information storage capacity cannot be solved
analytically, prompting us to pursue a reverse approach
discussed in the next section.
IV. Calculation of the Synaptic Cost Function
from the Distribution of Synaptic Weights
The problem of directly and analytically finding the
capacity-achieving input distribution and the channel
capacity for a specified cost function is often rather
difficult and isonly known inclosed form for certain spe-
cial cases. In most cases, the channel capacity and
capacity-achieving input distribution are found using
numerical algorithms (Arimoto, 1972; Blahut, 1972). In
neuroscience, this algorithm was used in the context
of optimal information transmission by the spike
code (Balasubramanian et al., 2001; de Polavieja,
An alternative way to attack the optimization problem
is tospecify thechannel noise distribution and thechan-
nel input distribution and then determine the channel in-
pacity (Csisza ´r and Ko ¨rner, 1997; Gastpar et al., 2003).
This methodology does not seem to have been used for
neuroscience investigations, other than for a brief look
at sensory processing (Gastpar, 2003). Although this
method inverts the problem specification, it seems rea-
sonable if we are not sure of what the channel input cost
function is (e.g., we do not know what a in Equation 1.1
may then be examined for relevance to the problem at
As before, we consider memory as a communication
channel (Figure 1). Information is stored inthe input vari-
able A, the retrieved (output) value of which is desig-
nated B. Gastpar et al. (2003) show that for a fixed chan-
nel input distribution p(A) and channel noise p(BjA), the
system is optimal—in the sense of operating at capacity
cost—if the cost function is of the form
where v > 0 and v0are arbitrary constants. Furthermore,
D($k$) denotes the KL divergence, which quantifies the
difference between the two probability distributions.
KL divergence is zero if and only if the two distributions
are identical. Note that the computed cost function is
optimal for any accessory volume cost.
First, we demonstrate that Equation 4.1 is valid in the
cases considered in previous sections. For the AWGN
model considered in section II, the input distribution
and the noise are Gaussian. Then the output distribution
is Gaussian as well. By substituting these distributions
into Equation 4.1, we can calculate the synaptic cost
function explicitly (see Experimental Procedures):
find that the cost function is quadratic in synaptic
weight, as was initially assumed in section II, thus vali-
dating Equation 4.1 for this case.
Another example that validates Equation 4.1 is the
model of discrete noiseless synaptic states considered
in section III. In this case, p(B=AijAj) = dijand—using
Equation 4.1—we find that the cost function is given
by the logarithm of the input (or, alternatively, output)
distribution (see Experimental Procedures):
This is exactly what Equation 3.3 would predict, thus
providing another validation for Equation 4.1.
Next, we use Equation 4.1 to calculate the synaptic
cost function from experimentally measured distribu-
tions of synaptic weights and noise. We use the dataset
from Sjo ¨stro ¨m et al. (2001, 2003), also analyzed in Song
et al. (2005), where EPSPs were recorded in several con-
calculation, we rely on the assumption that information
stored at a synapse, A, can be identified by the mean
EPSP amplitude. Then, the conditional density, p(BjA),
is estimated for each synapse as the distribution of
EPSP amplitudes across trials (Figure 4A). The marginal
density, p(B), is the distribution of EPSP amplitude over
all trials and synapses. By substituting these distribu-
tions into Equation 4.1 we find estimates of the cost
function, V(A), for each synapse (Figure 4B). A power
law with exponent 0.48 provides a satisfactory fit. Error
bars are obtained from a bootstrapping procedure
(see Experimental Procedures).
V. Discrete Synapses May Provide Optimal
sections II and IV might have given the impression that
the optimal distribution of synaptic strength must be
age in synapses by the AWGN channel with average
power constraint, for which the optimal, capacity-
achieving distribution is the continuous Gaussian distri-
bution (Shannon, 1948). In addition, using the methods
of section IV, one can construct numerous cost-con-
strained channels with continuous capacity-achieving
Here we suggest that discrete synaptic states may
achieve optimal or nearly optimal information storage.
First, we point out that, surprisingly, not all continuous
input channels have optimal input distributions that
are continuous. In particular, imposing a constraint on
the maximum weight (or volume) of a single synapse
of synaptic weights from continuous form to a set of dis-
crete values. Such a maximum amplitude constraint is
quite natural from the biological point of view, because
neither volume nor EPSP can be infinitely large. Note
that, unlike in section III, where discreteness was an as-
sumption used to simplify mathematical analysis, here
For concreteness, we return to the AWGN channel
model considered in section II, but now we impose
a maximum weight constraint in addition to the average
volume constraint that was originally imposed. The
problem then reduces to the well-studied problem of
and peak input power. For the AWGN channel, the
unique optimal input distribution consists of a finite set
of points. A proof of this fact is based on methods of
convex optimization and mathematical analysis (Smith,
uous channels is based on sampling the input space
(Blahut, 1972) and cannot be used to determine whether
the optimal input distribution is continuous or discrete.
Consequently, an analytical proof is necessary.
Since it is known that the optimal input distribution
consists of a finite number of points, one can numeri-
cally search over this sequence of finite dimensional
spaces to find the locations and probabilities of these
points for particular average power and peak power
values. Moreover, there is a test procedure, based on
Figure 4. Synaptic Cost Function Calculated from EPSP Measure-
(A) Typical distributions of EPSP amplitude among trials for synap-
ses characterized by different mean EPSP amplitudes.
culated from Equation 4.1 under assumption of optimal information
storage. Each data point represents a different synapse, with those
appearing in (A) highlighted in red. Horizontal error bars represent
the standard error for the mean EPSP amplitude; vertical error
bars represent the standard error for the KL divergence quantity in
Equation 4.1. The standard error was estimated by the bootstrap
procedure describedinExperimentalProcedures.Thepoints shown
with starred upper vertical error had infinite vertical error, as esti-
mated by the bootstrap procedure. The black line shows a least-
squares power law fit with exponent 0.48.
the necessity of satisfying the Karush-Kuhn-Tucker op-
timality conditions, to determine whether the obtained
numerical solution is in fact optimal. So one can apply
the numerical procedure to generate a possible solution
and unmistakably recognize whether this solution is op-
timal (Smith, 1971). Applying Smith’s optimization pro-
cedure, including both the search and the test for opti-
mality, yields the following result for the AWGN
channel. For noise power 1, symmetric peak amplitude
constraint [21.5, 1.5], and input power constraint
1.125 (an SNR close to 1), the optimal input distribution
consists of the zero point with large probability, and the
21.5 and 1.5 points with equal smaller probability
(Smith, 1971) (see Figure S2).
The conclusion that the distribution of synaptic
weights should be discrete valued holds not only for
the AWGN channel with hard limits imposed on synapse
size and weight, but also for other noise models. In par-
ticular, the discreteness result holds for a wide class of
additive noise channels under maximum amplitude con-
straint (Tchamkerten, 2004). Some fading channels that
have both additive and multiplicative noise and are sim-
ilarly constrained (Gursoy et al., 2005) also have this dis-
crete input property. Furthermore, channels other than
AWGN with constraints on both average power and
maximum amplitude have optimal input distributions
that consist of a finite number of discrete points (Huang
and Meyn, 2005).
A second observation is that—although there are
channels that have optimal input distributions that con-
sist of finite sets of discrete points—even channels that
have continuous optimal input distributions can be used
with discrete approximations of the optimal input distri-
bution and perform nearly at capacity. It is well known
that—in the average power constrained AWGN example
and in the limit of small SNR—the use of an alphabet
with only two symbols, 6CA2D1/2, does not significantly
reduce information storage capacity (Figure S3; also
see Experimental Procedures). In addition, Huang and
distributions, in some cases generated by sampling
the optimal continuous distribution, are only slightly
VI. Theoretical Predictions and Experiment
In this section, we compare theoretical predictions with
known experimental data and suggest further experi-
mental tests of the theory.
In section II, we find that, by considering an AWGN
channel, information storage capacity increases as
sory volume (V0) is less than the volume of a synapse
with unitary SNR (VN), storage is optimized by synapses
with average volume given by the geometric mean of V0
and VN. However, small accessory volume has a detri-
mental effect on the system functionality, because the
conduction time delay diverges as V0/ 0. As the mini-
volume is of the order of the synaptic volume (Chklovskii
et al., 2002; Wen and Chklovskii, 2005), the competition
between these requirements results in the optimal mean
synaptic volume being less than or equal to VN. This re-
sult is corroborated in the discrete-states model of sec-
tion III, where optimal synaptic volume was found to be
2VN. Although the noise is not explicitly represented in
that model, the synapse volume is the minimum possi-
ble. These two results predict that typical synapses
should be small and consequently noisy. Indeed, Isope
and Barbour (2002) report an SNR of w0.6 at parallel fi-
ber synapses onto cerebellar Purkinje cells. This finding
is in keeping with our prediction that the SNR should be
less than 1 but not infinitesimal. More generally, our pre-
showing the noisiness of typical central synapses (Allen
and Stevens, 1994; Hessler et al., 1993; Isope and Bar-
bour, 2002; Mason et al., 1991; Raastad et al., 1992;
Rosenmund et al., 1993; Sayer et al., 1990).
In section III, we argue that optimal information stor-
age requires sparseness of synaptic connectivity, and
we predict a relationship between the filling fraction, f,
(Equations 3.7 and 3.8). To make a quantitative compar-
ison with empirical observations, we consider a mouse
cortical column. Potential synaptic connectivity in a cor-
tical column is all to all, meaning that axons and den-
drites of any two neurons pass sufficiently close to
each other that they can be connected through local
synaptogenesis (Chklovskii et al., 2004; Kalisman
et al., 2005; Le Be and Markram, 2006; Stepanyants
and Chklovskii, 2005). According to Stepanyants et al.
(2002), the fraction of potential synapses converted
into actual ones in mouse cortex is w0.3; we take this
fraction to be our filling fraction, f w 0.3. By using Equa-
tion 3.8, we find that 2bVN= 2ln(0.3) = 1.2, and by using
volume per actual synapse is of the same order as the
accessory volume per actual synapse V0/f, in agreement
with experiments (Chklovskii et al., 2002). More detailed
calculation using Equation 3.7 shows that actual syn-
apse volume should be about 40% greater than acces-
sory volume per actual synapse. In reality, wire volume
is greater than synapse volume. This may be a conse-
quence of minimizing conduction delays as discussed
in section III. Hopefully, a future optimization framework
that combines conduction delays and information stor-
age capacity will account for this discrepancy.
Does this theory apply to the global brain network be-
global network seems consistent with the high cost of
wiring. However, a quantitative analysis is complicated
by the fact that—for a network that does not possess
potential all-to-all connectivity—the wiring cost de-
pends not just on the numbers of synapses but also on
which particular synapses are implemented. Therefore,
a detailed analysis of such a network would require
characterizing the cost and the information storage ca-
pacity of dendritic and axonal arbors quantitatively.
This is a difficult problem, because a theory of neuronal
arbors does not yet exist.
In section III, we predict that synaptic volume follows
an exponential distribution with the decay constant
b (Equation 3.3). This prediction can be tested experi-
mentally by measuring the volume of spine heads and
boutons in cortical neuropil. In comparing the distribu-
tion of volume, one should keep in mind that we are re-
ferring to the total volume of all synaptic contacts be-
tween two neurons (section I). In addition, if one
measures the filling fraction in the same neuropil, the
test involves no fitting parameters because b can be
calculated from the wiring volume and the filling frac-
tion; from Equation EP.9 and Equation 3.8, we get that
b = 2log(1 2 f)/V0. To overcome the difficulty in mea-
suring VNor V0, one can alternatively measure the ex-
perimentally accessible quantities f and CVDi > 0to deter-
mine b. Then, from Equations 3.7 and 3.8, b = 2log(f)/
(1 2 f)/CVDi > 0. However, these predictions are only
approximate, as the relative importance of maximizing
information storage and minimizing conduction delays
is unknown. In fact, these properties may vary de-
pending on animal species, brain region, and animal
In section III, we predict the distribution of synaptic
weight for arbitrary values of a (Equation 3.9), which
can be compared to the experimentally observed syn-
aptic weight distribution obtained in neocortical layer
ison, we sort synaptic weights into bins [Ai2 AN(Ai), Ai+
AN(Ai)] and plot a histogram (Figure S4). By performing
a least-squares fit of the logarithm of the EPSP distribu-
tion we find that the distribution is a stretched exponen-
tial with exponent 0.49. A least-squares fit of the stan-
dard deviation of EPSP amplitude as a function of
mean EPSP amplitude (Figure S1) yields a power law
with exponent 0.38. Hence, A/AN w A0.62, and from
Equation 3.9 we find that a = 0.49/0.62 = 0.79.
In section IV, we established a reverse link from the
distribution of synaptic weights and noise statistics
to the synaptic cost function. The best power-law fit
tothepoints in Figure4Byieldsasublinear costfunction
with exponent w0.48 (Figure 4B). Recalling that A/ANw
A0.62we find that a = 0.48/0.62 = 0.77. This estimate is
similar to that obtained using the discrete-states model,
thus validating the use of that model to approximate the
continuous distribution of synaptic weights (section III).
The prediction of a can be tested directly by measuring
the relationship between synaptic volume and weight.
Such an experiment would involve jointly measuring
the physical and electrophysiological properties of indi-
vidual synapses. Should the relationship between syn-
aptic weight and volume differ from that predicted in
section IV, other factors may contribute to the distribu-
tion of synaptic weights.
In section V, we argue that discrete synaptic states
could optimize information storage almost as well as—
and under some conditions better than—synapses
with continuous weights. This does not prove that syn-
apses with discrete states are strictly optimal; it merely
suggests that they could be. There is experimental evi-
dence that changes in the weights of individual synap-
ses are, in fact, discrete (Lisman, 2003; O’Connor
et al., 2005; Petersen et al., 1998), which seems consis-
tent with maximizing information storage. However, our
model assumes the samerelationship between synaptic
weight and volume for all synapses, which is only ap-
proximately correct. For example, synapses that are
more distant from the soma must be bigger to ensure
that somatic EPSP remains the same in the face of elec-
trotonic attenuation (Magee and Cook, 2000). Although
the optimal solution is not known in this case, we spec-
ulate that even if individual synapses were to have dis-
crete states, these states would not be the same among
all the synapses. In other words, the finding that individ-
ual synapses may change in discrete steps during plas-
ticity (Lisman, 2003; O’Connor et al., 2005; Petersen
et al., 1998) would not necessarily make the overall dis-
tribution of synaptic weights discrete.
Is there any evidence of discreteness in the distribu-
tion of synaptic weights? The distribution reported in
Song et al. (2005) is not monotonic; it has a maximum
at around 0.2 mV. Recalling that the full distribution of
synaptic states should include EPSPs of zero amplitude
(absent and silent synapses), there is a gap between the
zero-amplitude EPSP and the peak at around 0.2 mV
(Figure S1; also see Figure 5 in Song et al., 2005), which
hints that synaptic weight may be increased in discrete
steps and not continuously. Finally, we note that this
distribution was obtained combining data from hun-
dreds of animals (Song et al., 2005). Even if synaptic
states were discrete within one animal, such discrete-
ness would presumably be blurred by interanimal
We have argued that maximizing information storage
capacity per volume yields typical synapses that are
small and hence noisy. This explains experimental ob-
servations of synaptic unreliability. From the same prin-
ciple, we derived the distribution of synaptic weights
and found it to be a stretched exponential, in agreement
with existing measurements. This argument also ex-
plains the sparseness of synaptic connectivity. We
also suggest that the discreteness of synaptic states is
consistent with maximization of information storage
The strength of the information theory approach is
that it provides an upper bound on information storage
simply based on physical limitations, without explicitly
considering how memory is stored, retrieved, coded,
or decoded. For example, it is not known whether
error-correcting codes are used in the brain when in-
formation is stored across numerous synapses. Re-
gardless of whether error-correcting codes are used or
not, the capacity-achieving input distribution must be
used for optimal performance. Thus, our explanations
and predictions stand irrespective of whether or how
the brain uses error correction codes in information
Nevertheless, the independence of results obtained
using information theory approach on a specific imple-
mentation is also its weakness, because the impact of
unknown mechanisms is difficult to assess. Although in-
formation theory provides physical limits on information
storage capacity, there could be other constraints due
tomechanisms ofstorage and read-out, as well as oper-
ation requirements on the network. Neural network
models commonly assume specific mechanisms and
yield information storage capacity estimates different
from ours (Brunel et al., 2004; Gardner, 1987; McEliece
et al., 1987; Newman, 1988; Rolls and Treves, 1998). In-
terestingly, Brunel et al. (2004) predict a distribution of
synaptic weights similar to ours, although results such
as this one may depend on the details of the neural net-
work model at hand. Future research is needed to shed
more light on the biological mechanisms that shape and
constrain information storage and retrieval.
As our analysis relies on optimizing information stor-
age capacity, it is not applicable to brain regions for
which information storage is not the main task. For ex-
ample, synapses associated with early sensory pro-
cessing, e.g., in the retina (Laughlin et al., 1998; Sterling
and Matthews, 2005), or calyx of Held (von Gersdorff
and Borst, 2002), or those belonging to motorneurons
(Pierce and Mendell, 1993; Yeow and Peterson, 1991)
may be large and reliable. This would be consistent
with optimizing information transmission. In actuality,
any given brain circuit probably contributes to both
information storage and information transmission. In-
deed, by applying our analysis in reverse, one could
infer the role of a given circuit from its structural char-
acteristics. In particular, different cortical layers may be
optimized for a different combination of storage and
Our formulation of memory in the Shannon framework
implicitly casts each synapse—both potential and ac-
tual—as a channel usage. The total storage capacity is
age synaptic storage capacity. This makes the storage
capacity on the order of the number of synapses, which
would correspond to an overall maximal storage capac-
ity of several kilobits for a neocortical L5 pyramidal neu-
ron (Braitenberg and Schu ¨z, 1998). It is possible, how-
ever, that the synaptic information retrieval mechanism
involves multiple read-out attempts from a single syn-
apse. Since each channel usage is separated in space
rather than in time, this does not increase the number
of channel usages. Regardless, one may wonder what
impact multiple read-out attempts would have on our
analysis of information storage capacity.
It is known that the SNR increases approximately as
the square root of the number of read-out trials for
most forms of signal integration (Harrington, 1955), so
if the information stored in each synapse was retrieved
introduces a fixed multiplicative constant in Equation
porated into the VNterm, and all of our results stand.
Contrarily, if the number of read-out attempts is not
fixed,but varies across different synapses, then it would
ever, that multiple read-out attempts would lead to large
time delays. Yet, ifinformation is used tocontrol dynam-
ical systems, it is known that large delay can be disas-
trous (Sahai and Mitter, 2006). In addition, it is not clear
how short-term plasticity caused by multiple read-out
attempts would be overcome.
Other possible concerns arise from the lack of a true
experimentally established input-output characteriza-
tion of synaptic memory. To address this concern would
require identification and description of the so-called
engram—the physical embodiment of memory—which
corresponds to the channel input, A (Figure 1). In addi-
tion, it would necessitate a better characterization of
the noise process that determines the input-output
probability distribution, p(BjA). Description of the alpha-
bet of A would furthermore settle the question, alluded
to in section V, of whether synapses are discrete valued
or continuously graded. In addition, we assumed in sec-
metic mean of EPSPs observed in several trials. Alterna-
tives to this assumption may alter the horizontal
coordinate of points in Figure 4B.
Although our analysis relies on identifying synaptic
noise with retrieval—or more specifically the variability
of EPSP amplitude on ‘‘read-out’’—the noise may also
come from other sources. The main concern is perhaps
that long-term memory storage at a synapse is open to
perturbations due to active learning rules and ongoing
neuronal activity (Zhou and Poo, 2004), the so-called in
the greater the perturbations caused by such processes
(although see Abraham et al., 2002). Under generic as-
sumptions, Amit and Fusi (1994) demonstrated that
this noise restricts memory capacity significantly and
even paradoxically. Fusi et al. (2005) recently proposed
a solution to this paradox: the introduction of a cascade
of synaptic states with different transition probabilities
results in a form of metaplasticity that increases reten-
tion times in the face of ongoing activity. Presumably,
other forms of metaplasticity may also help protect
stored information from unwanted perturbations. In ad-
dition, the stability of physiological synaptic plasticity
appears to depend critically on the details of activity
patterns during and after the induction of plasticity
(Zhou and Poo, 2004), suggesting that specific biologi-
cal mechanisms for the protection of stored information
Our theory can be modified to include sources of
noise other than retrieval. For example, if in situ noise
is quantified and turns out dominant, it can be used in
the calculations presented in sections II–IV. In fact, opti-
mality of noisy synapses (section II) may be relevant to
the resolution of the above paradox. In general, a better
understanding of the system functionality including
characterization of storage, in situ, and retrieval noise
should help specify p(BjA) in the future.
Finally,our contributions include not onlymanyexpla-
nations and predictions of physical structures, but also
the introduction of methods developed elsewhere to
the study of memory in the brain. For example, to de-
velop the optimization principles, we have applied infor-
mation theory to the study of physical neural memory
systems. Moreover, our application of an alternate for-
mulation of the capacity cost problem to study the
cost function (section IV) appears to be the first instance
ity has been successfully applied to real system analy-
sis, whether biological or human engineered. This prob-
lem inversion has wide applicability to the experimental
study of information systems.
Calculation of the Optimum Average Synapse Volume
Here we calculate analytically the optimum average synapse volume
CVD that maximizes information storage capacity per volume Ivolume
for given accessory volume V0and normalization VN. This problem
is mathematically identical to maximizing information transmission
along parallel pathways (Sarpeshkar, 1998). We take the derivative
of Equation 2.3 and set it to zero to obtain
This implies that the optimal CVD can be found by solving the
In the limiting cases, the optimizing average volume CVD and the
maximum storage capacity achieved are given by (Sarpeshkar,
The exactdependenceofsynaptic volume andthe storage capac-
ity on the accessory volume is shown in Figure 2B.
Derivation of Optimal Probability Distribution for Discrete
Zero-Error Synaptic States
Following Balasubramanian et al. (2001) and de Polavieja (2002,
2004), we first consider the problem of maximizing information stor-
in a given (average synaptic) volume
Both thevolume constraintandthenormalizationcondition forthe
probabilities of synaptic weights can be included in the constrained
optimization by using Lagrange multipliers. Then we need to maxi-
By setting the derivatives of Equation 3.3 with respect to piequal
to zero, we find that
where Z = Siexp(2bVi) is a normalization constant (called the parti-
tion function in statistical physics) and b is implicitly specified by the
Recall now that?V is not given, and our objective is to maximize in-
formation per unit cost. Such an optimization problem can be
b such that the partition function Z = 1, i.e.,
In this case, the probability expression (Equation EP.7) simplifies
Substituting this expression into Equation EP.6, we find that infor-
mation storage capacity is given by
Combining this expression with Equation 3.2, we find that infor-
mation per volume is given by
which is Equation 3.4 of the main text.
Distribution over Discrete Synaptic States Equidistant
in Volume Space
Sums appearing in Equations EP.9 and EP.11 can be expressed in
a closed form if we assume that
Then we can rewrite the normalization condition (Equation EP.9)
where we used an expression for the sum of the geometric series.
Multiplying both sides of this expression by the denominator, we
Average synapse volume (including accessory volume V0) is given
which is Equation 3.6 of the main text. Inthe limitingcases, these ex-
pressions reduce to:
The volume of actual synapses (excluding accessory volume V0):
which is Equation 3.7 of the main text.
Finally the filling fraction:
which is Equation 3.8 of the main text.
Measurement of EPSPs in Synaptic Connections of L5
Our cost function computation (section IV) and experimental plots
(Figures S1 and S4) are based on the dataset from Sjo ¨stro ¨m et al.
(2001, 2003), where detailed methods have been previously de-
scribed. This dataset was analyzed with respect to connectivity pat-
terns and synaptic weights in Song et al. (2005). Briefly, acute visual
cortical slices were cut from rats aged P12–P20; whole-cell record-
ing configuration was established on up to four thick-tufted neocor-
tical layer V pyramidal neurons using a gluconate-based internal
solution; connectivity was assessed using a minimum of ten traces;
EPSPs were measured using a 1ms window centered on the peak of
the averaged EPSP trace. The dataset consisted of recordings from
637 connected pairs of neurons. Between 11 and 150 responses
were recorded in each connected pair (repeated every 7–20 s to
avoid short-term depression); the vast majority of connections ad-
mitted between 40 and 65 responses. We define the synaptic weight
as the mean EPSP averaged across all responses.
Computing the Optimal Cost Function for Gaussian
Input-AWGN Channel and Zero Error Channel
For the AWGN channel, let Z be the random variable that represents
theindependent additivenoise. ThentheexpressioninEquation4.1,
up to affine transformation, reduces to
Since we are only interested in the quantity up to affine transfor-
mation, we need not calculate the first entropy term explicitly, since
it is constant, call it C1.
bining terms that need not be calculated explicitly. The result is
Equation 4.2 in the main text.
For the zero-error channel, the optimal cost function is
= 2lnpðA=AjÞ= 2ln pi
Calculation of the Synaptic Cost Function
In order to compute the optimal cost function according to Equation
4.1, we require the channel output distribution as well as the channel
output cumulative distribution function, F(B), we simply use the em-
pirical cumulative distribution function, Femp(B). To account for the
variable number of EPSPs acquired from each synaptic connection,
the step size in the empirical cumulative distribution function con-
tributed by each data point is inversely proportional to the number
of absent synapses, we included in the empirical distribution func-
tion a steep Gaussian distribution function with mean at zero EPSP
amplitude and standard deviation of 0.1 mV (typical noise ampli-
tude). The area under this Gaussian distribution function is given by
the channel conditional density, p(BjA), we assume that all EPSPs
from a given synapse correspond to the same input letter; further-
more, we make a correspondence between the mean EPSP ampli-
tude and this input letter. For each synapse, we use a histogram
with ten uniformly spaced bins to estimate the conditional density,
pemp(BjA).Then the KLdivergence is approximated bythe following:
where B = biimplies presence in the ith histogram bin, and the right
and left bin edges are denoted riand li, respectively. This KL diver-
gence is computed for each synapse, and the result is the optimal
cost function (Figure 4B).
We use the bootstrap method to determine confidence intervals.
The confidence interval represented by the horizontal error bars is
a small sample estimate of the standard error of the mean EPSP am-
plitude. The estimate is the sample standard deviation of the mean
EPSP amplitudes generated by sampling with replacement from
the empirical distribution that were measured for each synapse.
The result is based on 50 bootstrap trials. The confidence interval
represented by the vertical error bars is a small sample estimate of
the standard error of the KL divergence quantity in Equation
EP.20. In order to do the sampling with replacement for the boot-
strap procedure, randomness is introduced in two ways. First, 637
synapses are selected uniformly with replacement from the set of
637 measured synapses. These resampled synapses are used to
generate the empirical cumulative distribution function, Femp(B).
Second, for each of the measured synapses, EPSP values are se-
lected uniformly with replacement to produce the empirical condi-
tionalprobability massfunctionspemp(Bja)foreach ofthe637values
of a. Finally, Equation EP.20 is used to calculate a bootstrap version
of the data points in Figure 4B. This doubly stochastic resampling
procedure is repeated for 50 bootstrap trials, and the estimate is
the sample standard deviation of the KL divergence quantities for
each of the synapses. When we resample the synapses to generate
the Femp(B) for each of the bootstrap trials, there is a possibility that
no large EPSP amplitude synapse is selected. In such a case, the KL
divergence quantity will come out to be infinite since the output dis-
tribution will be zero where the conditional distribution will be non-
zero, so the two distributions will not be absolutely continuous
with respect to each other. More simply, Femp(B = rimV) will equal
Femp(B = limV), so the denominator inside the logarithm in Equation
EP.20 will be zero and cause the entire expression to be infinite. This
phenomenon of lack of distribution overlap resulting in an infinite
discontinuity in the KL divergence functional causes the two infinite
upper standard deviation errors represented by stars in Figure 4B.
For these two points, the lower standard deviation is estimated by
excluding points greater than the mean. For other points, the boot-
strap distributions are approximately symmetric, so the vertical er-
ror bars aresymmetrically replicated. The unweightedleast-squares
fit in the MATLAB curve-fitting toolbox is used to generate the fit to
a function of the form vAhwith h = 0.48, Figure 4B.
AWGN Channel Capacity at Low SNR
The information capacity of the AWGN channel with binary
input 6CA2D1/2is achieved by using the two inputs with equal proba-
bility. Define the variable SNR to be
Perturbing the input values with Gaussian noise yields the output
where the system has been normalized so the channel output is nor-
malized by the noise amplitude: b / b/AN. The output distribution
then has entropy
which can be computed numerically.
The capacity is then given by
where we have used the property that for additive noise channels,
the conditional entropy is the entropy of the noise process as well
as the known entropy of Gaussian random variables. The value of
the mutual information is close to the value from Equation 2.1 at
small SNR (Figure S3).
The Supplemental Data for this article can be found online at http://
We are grateful to M. DeWeese, A. Koulakov, C. Machens, R. Mali-
now, and S.K. Mitter for commenting on the early versions of the
manuscript and to A. Roth, Y. Mishchenko, M. Ha ¨usser, L. Sriniva-
san, and S.B. Nelson for discussions. We also thank the anonymous
reviewers and the editor for helping clarify our presentation. This
work was supported by the NIH Grant MH69838, the Klingenstein
Foundation Award, an NSF Graduate Research Fellowship, the
NSF Grant CCR-0325774, the Wellcome Trust, and an EU Marie
Received: May 4, 2005
Revised: April 27, 2006
Accepted: October 10, 2006
Published: November 8, 2006
Abraham, W.C., Logan, B., Greenwood, J.M., and Dragunow, M.
(2002). Induction and experience-dependent consolidation of stable
long-term potentiation lasting months in the hippocampus. J. Neu-
rosci. 22, 9626–9634.
Allen, C., and Stevens, C.F. (1994). An evaluation of causes for unre-
liability of synaptic transmission. Proc. Natl. Acad. Sci. USA 91,
Amit, D.J., and Fusi, S. (1994). Learning in neural networks with ma-
terial synapses. Neural Comput. 6, 957–982.
Arimoto, S. (1972). An algorithm for computing the capacity of arbi-
trary discrete memoryless channels. IEEE Trans. Inform. Theory IT-
Balasubramanian, V., Kimber, D., and Berry, M.J., II. (2001). Meta-
bolically efficient information processing. Neural Comput. 13, 799–
Bekkers, J.M., and Stevens, C.F. (1995). Quantal analysis of EPSCs
recorded from small numbers of synapses in hippocampal cultures.
J. Neurophysiol. 73, 1145–1156.
Blahut, R.E. (1972). Computation of channel capacity and rate-
distortion functions. IEEE Trans. Inform. Theory IT-18, 460–473.
Braitenberg, V., and Schu ¨z, A. (1998). Cortex: Statistics and Geom-
etry of Neuronal Connectivity (Berlin, New York: Springer).
Brunel, N., Hakim, V., Isope, P., Nadal, J.-P., and Barbour, B. (2004).
Optimal information storage and the distribution of synaptic
weights: perceptron versus Purkinje cell. Neuron 43, 745–757.
Cash,S.,and Yuste, R.(1999). Linearsummationof excitatoryinputs
by CA1 pyramidal neurons. Neuron 22, 383–394.
Cherniak, C., Changizi, M., and Kang, D.W. (1999). Large-scale opti-
mization of neuron arbors. Phys. Rev. E Stat. Phys. Plasmas Fluids
Relat. Interdiscip. Topics 59, 6001–6009.
Chklovskii, D.B. (2004). Synaptic connectivity and neuronal mor-
phology: two sides of the same coin. Neuron 43, 609–617.
Chklovskii, D.B., Schikorski, T., and Stevens, C.F. (2002). Wiring op-
timization in cortical circuits. Neuron 34, 341–347.
and information storage. Nature 431, 782–788.
Csisza ´r, I., and Ko ¨rner, J. (1997). Information Theory: Coding Theo-
rems for Discrete Memoryless Systems (Budapest: Akade ´miai
de Polavieja, G.G. (2002). Errors drive the evolution of biological sig-
nalling to costly codes. J. Theor. Biol. 214, 657–664.
dePolavieja,G.G. (2004). Reliablebiologicalcommunication with re-
alistic constraints. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. 70,
del Castillo, J., and Katz, B. (1954). Quantal components of the end-
plate potential. J. Physiol. (London) 124, 560–573.
Eldridge, D.F. (1963). A special application of information theory to
recording systems. IEEE Trans. Audio AU-11, 3–6.
Faisal, A.A., White, J.A., and Laughlin, S.B. (2005). Ion-channel noise
places limits on the miniaturization of the brain’s wiring. Curr. Biol.
Fusi, S., Drew, P.J., and Abbott, L.F. (2005). Cascade models of syn-
aptically stored memories. Neuron 45, 599–611.
Gardner, E. (1987). Maximum storage capacity in neural networks.
Europhys. Lett. 4, 481–485.
Gastpar, M. (2003). To code or not to code. PhD dissertation, E´cole
Polytechnique Fe ´de ´rale de Lausanne, Switzerland.
Gastpar, M., Rimoldi, B., and Vetterli, M. (2003). To code, or not to
code: lossy source-channel communication revisited. IEEE Trans.
Inform. Theory 49, 1147–1158.
Goldman, M.S. (2004). Enhancement of information transmission ef-
ficiency by synaptic failures. Neural Comput. 16, 1137–1162.
Gursoy, M.C., Poor, H.V., and Verdu ´, S. (2005). The noncoherent Ri-
cian fading channel—part I: structure of the capacity-achieving in-
put. IEEE Trans. Wireless Commun. 4, 2193–2206.
Harrington, J.V. (1955). An analysis of the detection of repeated sig-
nals in noise by binary integration. IRE Trans. Inform. Theory 1, 1–9.
Hebb, D.O. (1949). The Organization of Behavior: A Neuropsycho-
logical Theory (New York: Wiley).
Hessler, N.A., Shirke, A.M., and Malinow, R. (1993). The probability
of transmitter release at a mammalian central synapse. Nature
Holmgren, C., Harkany, T., Svennenfors, B., and Zilberter, Y. (2003).
Pyramidal cell communication within local networks in layer 2/3 of
rat neocortex. J. Physiol. 551, 139–153.
Hsu, A., Tsukamoto, Y., Smith, R.G., and Sterling, P. (1998). Func-
tional architecture of primate cone and rod axons. Vision Res. 38,
Huang, J., and Meyn, S.P. (2005). Characterization and computation
of optimal distributions for channel coding. IEEE Trans. Inform. The-
ory 51, 2336–2351.
Immink, K.E.S., Siegel, P.H., and Wolf, J.K. (1998). Codes for digital
recorders. IEEE Trans. Inform. Theory 44, 2260–2299.
Purkinje cell synapses in adult rat cerebellar slices. J. Neurosci. 22,
Jimbo,M.,andKunisawa,K.(1979).Aniteration method forcalculat-
ing the relative capacity. Information and Control 43, 216–223.
Kalisman, N., Silberberg, G., and Markram, H. (2005). The neocorti-
cal microcircuit as a tabula rasa. Proc. Natl. Acad. Sci. USA 102,
Kasai, H., Matsuzaki, M., Noguchi, J., Yasumatsu, N., and Nakahara,
H. (2003). Structure-stability-function relationships of dendritic
spines. Trends Neurosci. 26, 360–368.
Koch, C.(1999).Biophysics ofComputation:Information Processing
in Single Neurons (New York: Oxford University Press).
Koester, H.J., and Johnston, D. (2005). Target cell-dependent nor-
malization of transmitter release at neocortical synapses. Science
Kolmogorov, A., and Tihomirov, V. (1959). 3-entropy and 3-capacity
of sets in functional spaces. Uspekhi Matematicheskikh Nauk 14,
Kopec, C.D., Li, B., Wei, W., Boehm, J., and Malinow, R. (2006). Glu-
tamate receptor exocytosis and spine enlargement during chemi-
cally induced long-term potentiation. J. Neurosci. 26, 2000–2009.
Laughlin, S.B., de Ruyter van Steveninck, R.R., and Anderson, J.C.
(1998). The metabolic cost of neural information. Nat Neurosci 1,
Le Be, J.V., and Markram, H. (2006). Spontaneous and evoked syn-
aptic rewiring in the neonatal neocortex. Proc. Natl. Acad. Sci. USA
Levy, W.B., and Baxter, R.A. (1996). Energy efficient neural codes.
Neural Comput. 8, 531–543.
Levy, W.B., and Baxter, R.A. (2002). Energy-efficient neuronal com-
putation via quantal synaptic failures. J. Neurosci. 22, 4746–4755.
Lisman, J. (2003). Long-term potentiation: outstanding questions
and attempted synthesis. Proc. R Soc. Lond. B Biol. Sci. 358, 829–
Lisman, J.E., and Harris, K.M. (1993). Quantal analysis and synaptic
anatomy–integrating two views of hippocampal plasticity. Trends
Neurosci. 16, 141–147.
Lynch, M.A. (2004). Long-term potentiation and memory. Physiol.
Rev. 84, 87–136.
pendent of synapse location in hippocampal pyramidal neurons.
Nat. Neurosci. 3, 895–903.
Manwani, A., and Koch, C. (2000). Detecting and estimating signals
over noisy and unreliable synapses: information-theoretic analysis.
Neural Comput. 13, 1–33.
Markram, H. (1997). A network of tufted layer 5 pyramidal neurons.
Cereb. Cortex 7, 523–533.
Markram, H., Lu ¨bke, J., Frotscher, M., Roth, A., and Sakmann, B.
(1997). Physiology and anatomy of synaptic connections between
thick tufted pyramidal neurones in the developing rat neocortex.
J. Physiol. 500, 409–440.
Mason, A., Nicoll, A., and Stratford, K. (1991). Synaptic transmission
J. Neurosci. 11, 72–84.
Matsuzaki, M., Ellis-Davies, G.C.R., Nemoto, T., Miyashita, Y., Iino,
M., and Kasai, H. (2001). Dendritic spine geometry is critical for
AMPA receptor expression in hippocampal CA1 pyramidal neurons.
Nat. Neurosci. 4, 1086–1092.
Matsuzaki, M., Honkura, N., Ellis-Davies, G.C., and Kasai, H. (2004).
Structural basis of long-term potentiation in single dendritic spines.
Nature 429, 761–766.
McEliece, R. (1977). The Theory of Information and Coding: A Math-
ematical Framework for Communication(London: Addison-Wesley).
McEliece, R.J., Posner, E.C., Rodemich, E.R., and Venkatesh, S.S.
(1987). The capacity of the Hopfield associative memory. IEEE
Trans. Inform. Theory IT-33, 461–482.
McGaugh, J.L. (2000). Memory—a century of consolidation. Science
Mitchison, G. (1991). Neuronal branching patterns and the economy
of cortical wiring. Proc. R. Soc. Lond. B Biol. Sci. 245, 151–158.
Morris, R.G. (2003). Long-term potentiation and memory. Proc. R.
Soc. Lond. B Biol. Sci. 358, 643–647.
Murthy, V.N., Schikorski, T., Stevens, C.F., and Zhu, Y. (2001). Inac-
tivity produces increases in neurotransmitter release and synapse
size. Neuron 32, 673–682.
Newman, C. (1988). Memory capacity in neural network models: rig-
orous lower bounds. Neural Netw. 1, 223–238.
gyi, P. (1998). Cell type and pathway dependence of synaptic AMPA
receptor number and variability in the hippocampus. Neuron 21,
bidirectional synaptic plasticity is composed of switch-like unitary
events. Proc. Natl. Acad. Sci. USA 102, 9679–9684.
Petersen, C.C., Malenka, R.C., Nicoll, R.A., and Hopfield, J.J. (1998).
All-or-none potentiation at CA3-CA1 synapses. Proc. Natl. Acad.
Sci. USA 95, 4732–4737.
Pierce, J.P., and Mendell, L.M. (1993). Quantitative ultrastructure of
Ia boutons in the ventral horn: scaling and positional relationships.
J. Neurosci. 13, 4748–4763.
Poirazi, P., Brannon, T., and Mel, B.W. (2003). Arithmetic of sub-
threshold synaptic summation in a model CA1 pyramidal cell.
Neuron 37, 977–987.
Polsky, A.,Mel,B.W., andSchiller,J.(2004).Computationalsubunits
in thin dendrites of pyramidal cells. Nat. Neurosci. 7, 621–627.
Raastad, M., Storm, J.F., and Andersen, P. (1992). Putative single
quantum and single fibre excitatory postsynaptic currents show
similar amplitude range and variability in rat hippocampal slices.
Eur. J. Neurosci. 4, 113–117.
de los Vertebrados (New York: Springer).
Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W.
(1997). Spikes: Exploring the Neural Code (Cambridge, MA: The MIT
Rolls, E., and Treves, A. (1998). Neural Networks and Brain Function
(Cambridge: Oxford University Press).
Root, W.L. (1968). Estimates of 3 capacity for certain linear commu-
nication channels. IEEE Trans. Inform. Theory IT-14, 361–369.
Rosenmund, C., Clements, J.D., and Westbrook, G.L. (1993). Non-
uniform probability of glutamate release at a hippocampal synapse.
Science 262, 754–757.
Sahai, A., and Mitter, S.K. (2006). The necessity and sufficiency of
anytime capacity for stabilization of a linear system over a noisy
communication link Part I: scalar systems. IEEE Trans. Inform.
Theory. 52, 3369–3395.
Sarpeshkar, R. (1998). Analog versus digital: extrapolating from
electronics to neurobiology. Neural Comput. 10, 1601–1638.
Sayer, R.J., Friedlander, M.J., and Redman, S.J. (1990). The time
course and amplitude of EPSPs evoked at synapses between
pairs of CA3/CA1 neurons in the hippocampal slice. J. Neurosci.
Schikorski, T., and Stevens, C.F. (1997). Quantitative ultrastructural
analysis of hippocampal excitatory synapses. J. Neurosci. 17,5858–
Schreiber, S., Machens, C.K., Herz, A.V.M., and Laughlin, S.B.
(2002). Energy-efficient coding with discrete stochastic events.
Neural Comput. 14, 1323–1346.
Shannon, C.E. (1948).A mathematical theory of communication. Bell
Syst. Tech. J. 27, 379–423 and 623–656.
Shannon, C.E. (1959). Coding theorems for a discrete source with
a fidelity criterion. IRE National Convention Record 4, 142–163.
Silver, R.A., Lu ¨bke, J., Sakmann, B., and Feldmeyer, D. (2003). High-
probability uniquantal transmission at excitatory synapses in barrel
cortex. Science 302, 1981–1984.
Sjo ¨stro ¨m, P.J., Turrigiano, G.G., and Nelson, S.B. (2001). Rate, tim-
ing, and cooperativity jointly determine cortical synaptic plasticity.
Neuron 32, 1149–1164.
Sjo ¨stro ¨m,P.J.,Turrigiano, G.G.,andNelson,S.B.(2003).Neocortical
LTD viacoincident activation ofpresynapticNMDAand cannabinoid
receptors. Neuron 39, 641–654.
Smith, J.G. (1971). The information capacity of amplitude- and vari-
ance-constrained scalar Gaussian channels. Inf. Control 18, 203–
Song, S., Sjo ¨stro ¨m, P.J., Reigl, M., Nelson, S., and Chklovskii, D.B.
(2005). Highly nonrandom features of synaptic connectivity in local
cortical circuits. PLoS Biol. 3, 0507–0519. 10.1371/journal.pbio.
Stepanyants, A., Hof, P.R., and Chklovskii, D.B. (2002). Geometry
and structural plasticity of synaptic connectivity. Neuron 34, 275–
Stepanyants, A., and Chklovskii, D.B. (2005). Neurogeometry and
potential synaptic connectivity. Trends Neurosci. 28, 387–394.
Sterling, P., and Matthews, G. (2005). Structure and function of
ribbon synapses. Trends Neurosci. 28, 20–29.
Streichert, L.C., and Sargent, P.B. (1989). Bouton ultrastructure and
synaptic growth in a frog autonomic ganglion. J. Comp. Neurol. 281,
Takumi, Y., Ramı ´rez-Leo ´n, V., Laake, P., Rinvik, E., and Ottersen,
O.P. (1999). Different modes of expression of AMPA and NMDA re-
ceptors in hippocampal synapses. Nat. Neurosci. 2, 618–624.
Tanaka, J., Matsuzaki, M., Tarusawa, E., Momiyama, A., Molnar, E.,
Kasai, H., and Shigemoto, R. (2005). Number and density of AMPA
receptors in single synapses in immature cerebellum. J. Neurosci.
Tchamkerten, A. (2004). On the discreteness of capacity-achieving
distributions. IEEE Trans. Inform. Theory 50, 2773–2778.
in the neocortex. Cereb. Cortex 13, 5–14.
Thomson, A.M., West, D.C., Wang, Y., and Bannister, A.P. (2002).
Synaptic connections and small circuits involving excitatory and in-
hibitory neurons in layers 2-5 of adult rat and cat neocortex: triple in-
tracellular recordings and biocytin labelling in vitro. Cereb. Cortex
Verdu ´, S. (1990). On channel capacity per unit cost. IEEE Trans. In-
form. Theory 36, 1019–1030.
Verdu ´, S. (2002). Spectral efficiency in the wideband regime. IEEE
Trans. Inform. Theory 48, 1319–1343.
von Gersdorff, H., and Borst, J.G. (2002). Short-term plasticity at the
calyx of held. Nat. Rev. Neurosci. 3, 53–64.
Wen, Q., and Chklovskii, D.B. (2005). Segregation of the brain into
gray and white matter: a design minimizing conduction delays.
PLoS Comput. Biol. 1, e78. 10.1371/journal.pcbi.0010078.
Yeow, M.B., and Peterson, E.H. (1991). Active zone organization and
vesicle content scale with bouton size at a vertebrate central syn-
apse. J. Comp. Neurol. 307, 475–486.
Zador, A. (1998). Impact of synaptic unreliability on the information
transmitted by spiking neurons. J. Neurophysiol. 79, 1219–1229.
Zhou, Q., and Poo, M.M. (2004). Reversal and consolidation of activ-
ity-induced synaptic modifications. Trends Neurosci. 27, 378–383.
Zhou,Q., Homma, K.J., and Poo, M.M. (2004). Shrinkage of dendritic
spines associated with long-term depression of hippocampal syn-
apses. Neuron 44, 749–757.