Characterization of minimum error linear coding with sensory and neural noise.
ABSTRACT Robust coding has been proposed as a solution to the problem of minimizing decoding error in the presence of neural noise. Many real-world problems, however, have degradation in the input signal, not just in neural representations. This generalized problem is more relevant to biological sensory coding where internal noise arises from limited neural precision and external noise from distortion of sensory signal such as blurring and phototransduction noise. In this note, we show that the optimal linear encoder for this problem can be decomposed exactly into two serial processes that can be optimized separately. One is Wiener filtering, which optimally compensates for input degradation. The other is robust coding, which best uses the available representational capacity for signal transmission with a noisy population of linear neurons. We also present spectral analysis of the decomposition that characterizes how the reconstruction error is minimized under different input signal spectra, types and amounts of degradation, degrees of neural precision, and neural population sizes.
-
Citations (0)
-
Cited In (0)
Page 1
Characterization of minimum error linear coding with
sensory and neural noise
Eizaburo Doi
edoi@cns.nyu.edu
Center for Neural Science, New York University, New York, NY 10003, U.S.A.
Michael S. Lewicki
michael.lewicki@case.edu
Electrical Engineering and Computer Science Department, Case Western Reserve Univer-
sity, Cleveland, OH 44106, U.S.A.
Abstract: Robust coding has been proposed as a solution to the problem of minimizing
decoding error in the presence of neural noise. Many real world problems, however, have
degradation in the input signal, not just in neural representations. This generalized problem
is more relevant to biological sensory coding where internal noise arises from limited neural
precision and external noise from distortion of sensory signal such as blurring and photo-
transduction noise. In this note, we show that the optimal linear encoder for this problem
can be decomposed exactly into two serial processes that can be optimized separately. One
is Wiener filtering that optimally compensates for input degradation. The other is robust
coding that best utilizes the available representational capacity for signal transmission with
a noisy population of linear neurons. We also present spectral analysis of the decompo-
sition that characterizes how the reconstruction error is minimized under different input
signal spectra, types and amounts of degradation, degrees of neural precision, and neural
population sizes.
1
Page 2
1Introduction
We address the problem of how to linearly transform N-dimensional inputs into M-dimensional
representations in order to best transmit signal information through noisy Gaussian chan-
nels when the input signal is degraded. Here the degradation is modeled by additive white
Gaussian noise as well as a linear distortion such as blurring. Most of the earlier studies
have addressed the problem without such a degradation (for a recent review, see Palomar
and Jiang, 2007, and the references therein), although it is inevitable in almost any real
systems. It is therefore important to characterize and understand its impact on the opti-
mal encoder. The main result of this note is that, under a minimum mean squared error
(MMSE) objective, the optimal linear transform can be decomposed exactly into optimal
de-noising/de-blurring (Wiener filtering) and optimal coding for a noisy Gaussian channel
(robust coding; Lee, 1975; Lee and Petersen, 1976; Doi et al., 2007). Such decomposition was
shown earlier in the context of source coding and quantization (Dobrushin and Tsybakov,
1962; Sakrison, 1968) and proven in a general case (Wolf and Ziv, 1970). Although it has
been extensively studied (for a review, see Gray and Neuhoff, 1998, section V-G), a unified
treatment in the context of MMSE linear coding and its characterization has not been pro-
vided. We also offer an alternative proof of the decomposition in the course of characterizing
the solution.
A special case of this problem was examined previously, in which the linear transform
was assumed to be convolutional, implying that the channel (or neural) dimension is equal to
the input (or sensory) dimension and that individual coding units are shifted version of each
other with the identical filter shape (Ruderman, 1994). Those simplifying assumptions have
commonly been made in the study of optimal sensory coding, because it matches reasonably
well in a certain sensory system (e.g., foveal retina) and it is analytically more tractable (At-
ick and Redlich, 1990; Atick et al., 1990; Atick and Redlich, 1992; van Hateren, 1992). Also,
2
Page 3
an approximate decomposition along similar lines has been used under an information max-
imization objective (Atick and Redlich, 1992; Atick, 1992; Dayan and Abbott, 2001, Ch.4).
Here, we assume no such restrictions and provide a general solution with the exact decom-
position, in which the channel dimension may be smaller than, equal to, or larger than the
input dimension (undercomplete, complete, or overcomplete representations, respectively),
and individual coding units are not restricted to have the same filter shape.
2Problem formulation
The model system consists of three processes (Figure1a): generation of the observed signal
(eq.1), encoding (eq.2), and decoding (eq.3). The observed signal x ∈ RNis a degraded
version of the original signal s ∈ RNvia a fixed linear transform followed by the additive
white Gaussian noise (AWGN):
x = Hs + ν(1)
where s is zero mean with covariance Σs, H ∈ RN×Nis a fixed linear distortion, and ν ∼
N(0N,σ2
distributed. Encoding is assumed to be a linear transform into an arbitrary dimension, and
νIN) is sensory (or input) noise. Note that signal s may not necessarily be Gaussian
the resulting representations may be undercomplete, complete, or overcomplete:
r = Wx + δ(2)
with W ∈ RM×Nthe linear encoder, δ ∼ N(0M,σ2
and r the representation. Decoding is assumed to be linear:
δIM) neural (channel, or output) noise,
ˆ s = Ar (3)
where A ∈ RN×Mis the linear decoder and ˆ s is the estimate of the original signal.
The decoding error ? is
? = s −ˆ s(4)
3
Page 4
and the mean squared error (MSE) E is
E
= ??T??
= tr???T?
= tr(Σs) − 2tr(AWHΣs) + tr(AWΣxWTAT) + σ2
(5)
(6)
δtr(AAT)(7)
where ?·? is the sample average and Σx= HΣsHT+σ2
x.
νINis the covariance of the observations
We are interested in minimizing the MSE function subject to one of the following power
constraints:
tr(WΣxWT) = P
diag(WΣxWT) =
(8)
P
M
(9)
where eq.8 is to constrain the sum of the variances of filter’s outputs (referred to as the total
power constraint) while eq.9 is to constrain the individual variance (the individual power
constraint; note this is a sufficient condition to satisfy the total power constraint) (Lee, 1975;
Lee and Petersen, 1976).
The problem is to find W and A that minimize E subject to one of the power constraints
above.
3Results
3.1 Optimal linear decoder
The linear decoder A can be expressed in terms of W and the optimization can be reduced
to finding solely the optimal W. This is derived from the necessary condition of minimum
MSE with respect to A:
∂E
∂A= 0(10)
⇔ Aopt= ΣsHTWT?WΣxWT+ σ2
δIM
?−1.(11)
4
Page 5
H
⊕
W
A
H
⊕
⊕
A
⊕
s
ν
x
s
ν
x
s∗
y
ˆ y
ˆ s
δ
r
δ
r
W1
W2
(a) Generalized problem
(b) Wiener filtering problem
(c) Robust coding problem
Figure 1:Model diagrams. (a) The generalized problem. Both encoder W and decoder
A are optimized. (b) In the first subproblem, signal degradation is caused by a fixed linear
transform and AWGN. The encoder W1 is optimized. The solution is given by Wiener
filtering. (c) In the second subproblem, signal transmission is limited both by AWGN in the
channel and by the limited channel dimension, and both encoder W2and decoder A are
optimized. The solution is given by robust coding.
5
Page 6
This result was shown previously for a special case in which W is convolutional (Ruderman,
1994).
3.2Exact decomposition
Our main result is based on the observation that the minimum MSE can be decomposed as
follows (see Wolf and Ziv, 1970, for a general case):
Eopt= (Wiener filtering error) + (Robust coding error).(12)
The “Wiener filtering error” is the theoretical error bound for a linear method that best
counteracts input degradation caused by blurring and AWGN, and is achieved by the Wiener
filtering (Figure1b). The “Robust coding error” is the theoretical bound for a linear method
that best utilizes noisy Gaussian channels of a limited dimension (neural population size),
and is achieved by robust coding (Lee, 1975; Lee and Petersen, 1976; Doi et al., 2007)
(Figure1c). If robust coding is solved in the setting under which the input signal y is the
Wiener estimate s∗(and hence the reconstruction by robust coding ˆ y = ˆ s∗), then it can be
shown that the product of Wiener filtering and robust coding, W2W1, is the optimal linear
transform for the generalized problem (Figure1).
Proof. We analyze the problem in the whitened signal space because this simplifies the formu-
las. We rewrite the covariance of the observed signal x using the eigenvalue decomposition,
Σx= EDxET,(13)
and also rewrite the linear encoder as
W =
?
P
MVD
−1
x ET,
2
(14)
where D
−1
x ETwhitens the observed signal x, and V is the linear encoder with respect to
2
the whitened signal. Note that the inverse of Dxalways exists because of the non-zero input
noise ν.
6
Page 7
The first benefit of this parameterization can be seen in the constraint functions eq.8-9
that are simplified as
tr(VVT) = M,(15)
diag(VVT) = 1M.(16)
Also, the decoder eq.11 is simplified as:
A =
?
?
?
?
?
P
MΣsHTED
−1
x VT
2
?P
?
MVVT+ σ2
δIM
?−1
(17)
=
P
MΣsHTED
−1
x VT
2
σ−2
δIM−P
Mσ−2
δVCVTσ−2
?
δ
?
(18)
=
P
MΣsHTED
−1
x
2
?
?IM−?−IM+ C−1?C?σ−2
x Cσ−2
IM−P
Mσ−2
δVTVCσ−2
δVT
(19)
=
P
MΣsHTED
P
MΣsHTED
−1
x
2
δVT
(20)
=
−1
2
δVT
(21)
where
C =
?
IN+P
Mσ−2
δVTV
?−1
(22)
and we used the Woodbury matrix identity in eq.18.
Under a minor assumption that the signal covariance Σsand the fixed linear distortion
H share the same eigenvectorsa, we rewrite Σs = EΛETand H = E∆ET, and eq.21 is
aThis holds in the application of image coding. Specifically, the image signal s is shift-
invariant, and the linear distortion is optical blur and hence H is convolutional. In such a
case, the eigenvectors are given by the Fourier basis functions. More precisely, we further
assume the periodic boundary condition for both s and H, and then Σsand H are circulant in
addition to Toeplitz. The eigenvector matrix of circulant Toeplitz matrix is the DFT matrix.
Alternatively, we can employ an approximation for a non-circulant Toeplitz matrix without
assuming periodic boundary condition, which is called circulant approximation (Gray, 2006).
7
Page 8
further simplified as:
A =
?
P
Mσ−2
δEΛ∆D
−1
x CVT.
2
(23)
By substituting eq.14 and eq.21 in eq.7, the MSE can be expressed in terms of V:
E
= tr(Λ) − 2?tr(Λ2∆2D−1
+tr(Λ2∆2D−1
+tr?Λ2∆2D−1
= tr(Λ) − tr(Λ2∆2D−1
x) − tr?Λ2∆2D−1
xC??
x) − 2tr?Λ2∆2D−1
xC?+ tr?Λ2∆2D−1
xC2?
xC?− tr?Λ2∆2D−1
xC2?
(24)
x) + tr?Λ2∆2D−1
xC?.(25)
Thus we have arrived at the exact decomposition given in eq.12. The first term in eq.25
is the signal variance, and the second term is the (negative) variance of the best linear
reconstruction subject to the input degradation (or the variance of the Wiener estimate)
(Gonzalez and Woods, 2002). Therefore, the first two terms collectively correspond to the
MMSE subject to the input degradation. The third term correspond to the MSE function
with V subject to AWGN in M-dimensional channel whose N-dimensional input spectrum
˜Dx= Λ2∆2D−1
x (Lee, 1975; Lee and Petersen, 1976; Doi et al., 2007).bThe optimal V that
satisfies the total or the individual power constraint is ready to be derived from these earlier
studies.
bThe corresponding MMSE problem in the original signal space (before whitening) is to
?
signal. The power constraints are respectively
find a linear transform˜ W =
P
MV˜D
−1
2
x ETwhere˜Σx= E˜DxETis the covariance of input
tr(˜ W˜Σx˜ WT) = P
diag(˜ W˜Σx˜ WT) =P
⇒ tr(VVT) = M,
⇒ diag(VVT) = 1M,
(26)
M1M
(27)
namely, those robust coding solutions satisfy the same power constraints in the generalized
problem.
8
Page 9
It implies that the optimal filtering in the MSE sense is given by de-noising and de-
blurring of the input signal, followed by the optimal coding of this cleaned signal where
the signal transmission is restricted by channel noise and the limited channel dimension.
The first filtering is separable from the second, implying that input degradation cannot be
compensated by increasing the representational capacity of the channel, for example, by
decreasing channel noise or by increasing the channel dimension.
In summary, the MMSE linear transform can be obtained by the following steps:
1. Find the Wiener filter given the input signal spectrum, linear distortion, and the
amplitude of input noise: Wwiener
2. Find the robust coding solution given the Wiener estimate spectrum, the amplitude of
channel noise, and the channel dimension: Wrobust
3. The optimal linear transform for the generalized problem is: Wmmse= WrobustWwiener
3.3Spectral characterization
The exact decomposition into Wiener filtering and robust coding allows us to gain insight
into how the reconstruction error is minimized, by analyzing how the power spectrum (eigen-
values) of a generic signal is transformed through the two serial processes.
The first process, Wiener filtering, can be expressed by
Wwiener= E∆ΛD−1
xET
(28)
where Dx= ∆2Λ + σ2
νIN. Recall the spectrum of the Wiener estimate s∗is ∆2Λ2D−1
x
≡
Λ∗= diag(λ∗
1,··· ,λ∗
N), and how the original signal is restored against linear distortion and
input noise is well understood.
The second process, robust coding, is given by
Wrobust= P
??
P
MΩΛ∗−1
2
?
ET
(29)
9
Page 10
where P is some M-dimensional orthogonal matrixcand
??
P
MΩΛ∗−1
2
?
is the gain of input
signal in the eigenspace (note the input signal of this second process, y = s∗, is represented
in the signal’s eigenspace by ETin eq.29). The i-th element of this gain (squared, for clarity)
is given by
P
Mω2
i
1
λ∗
i
=
?
0,
1
√l0
−
1
?λ∗
i
?
σ2
δ
?λ∗
i
, if i = 1 ∼ K,(30a)
otherwise,(30b)
where
l0=
?
1
P
Kσ2
δ+ 1
1
K
K
?
i=1
?λ∗
i
?2
(31)
is the threshold and only those signal components whose eigenvalues are greater than this
value are encoded with robust coding, and K is the total number of such components (without
loss of generality, we assume λ∗
iis sorted in descending order; see Lee, 1975; Lee and Petersen,
1976, for derivation). K needs to be computed numerically.
Our analysis is reduced to that in Ruderman (1994) when the linear encoder W is re-
stricted to be convolutional, namely, when the eigenvector matrix E is the Fourier basis and
the M-dimensional orthogonal matrix P is also given by the same Fourier basis E (note this
is N-dimensional).
Unlike Wiener filtering, robust coding is much less widely known and its characteris-
tics have not been fully examined except for one- or two-dimensional signal problems (Doi
cUnder the total power constraint, the robust coding solution is unique up to an orthog-
onal matrix P (eq.29). Note that P is cancelled out from eq.8. Under the individual power
constraint, P acts to distribute the total power evenly over the coding units. This problem is
known as the inverse eigenvalue problem (Chu and Golub, 2005) and the existence of such an
orthogonal matrix and an algorithm to find it were shown in (Lee, 1975; Lee and Petersen,
1976).
10
Page 11
et al., 2007). Next we illustrate four major characteristics of robust coding for a general
N-dimensional signal.
(i) The threshold l0defines which input signal dimensions are encoded. Figure2a shows
signal’s eigenvalues (power spectrum), and those components that exceed l0are en-
coded and the rest are discarded in robust coding. The corresponding “critical” index
(or frequency), K, is indicated with the gray vertical line in all panels in Figure2.
From eq.31, we observe the following:
• The threshold l0 gets lower (and hence the critical frequency K goes higher;
Figure2a) if the larger total power P (eq.8) is available in the neural population.
The total power can be increased by increasing the power available for individual
neurons, or by increasing the neural population size while the individual neural
power is fixed, or both. Note that overcompleteness in our model is useful to
minimize MSE because it could increase the total power. If the overcompleteness
is increased while the total power is fixed, then the MMSE will not change. (This
could still be beneficial for the other reason, for example, if a large number of
low power coding units is more economical than small number of high power
coding units.) The changes of encoder and the critical frequency by doubling the
population size at a fixed individual neural power is illustrated in Figure2b, and
the resulting change of the error ratio is shown in Figure2f.
• The threshold gets lower if channel noise σ2
is the effective SNR where K is the dimension of subspace represented by the
δis smaller. (Note
P
Kσ2
δ
in eq.31
neural population; it is different from the apparent, average SNR of the neural
P
Mσ2
δ
• The thresholding depends on the anisotropy (or distribution) of input spectrum
λ∗
population, which is given bywhere M is the neural population size.)
i, and there is no thresholding when it is isotropic. This was thoroughly analyzed
11
Page 12
for the two-dimensional signal (Doi et al., 2007).
(ii) The squared gain of the input signal (eq.30, illustrated in Figure2b) has the maximum
at the frequency L whose eigenvalue is λ∗
L= 4l0 (Figure2a). (This can be shown
by the first- and the second-order derivatives of eq.30a with respect to λ∗
i.) It is
interesting that the band-pass characteristic emerges under the MMSE objective, with
1/f2signal. In contrast, under the information maximization objective, the optimal
filtering is whitening (Atick, 1992; Cover and Thomas, 2006), and hence it is high-pass
filtering with 1/f2signal (its squared gain is proportional to f2).
(iii) The power spectrum of the noisy representation r (given by the product of the squared
gain of robust coding, eq.30, and the power spectrum of signal, λ∗
i, plus channel noise
spectrum, σ2
δ) is proportional to the square-root of the signal spectrum if the signal
is beyond the threshold; otherwise it is identical to the noise spectrum, i.e., those
components are all noise (Figure2c):
P
Mω2
i+ σ2
δ=
σ2
√l0
σ2
δ
δ
?λ∗
i
if i = 1 ∼ K,(32a)
otherwise.(32b)
This may be seen as an intermediate between the original signal λ∗
iand a flat spectrum
generated by whitening. This “half-whitening” is a distinct feature of robust coding.
(iv) The reconstruction is more precise for those components whose power is larger. The
multiplication of the noisy representation (Figure2c) and the decoder (2d) yields the
reconstructed signal (2e). The reconstruction error (also shown in Figure2e) is:
implying that the ratio of the error at each component i is (Figure2f):
?l0λ∗
λ∗
i
i
if i = 1 ∼ K,(33a)
otherwise,(33b)
(error variance)
(data variance)=
?
1
l0
λ∗
i
if i = 1 ∼ K, (34a)
otherwise.(34b)
12
Page 13
This results from the optimal allocation of a limited representational resource: more
resource is allocated to those signal components whose power is higher. In contrast,
the information maximization objective leads to whitening instead of robust coding,
in which the reconstruction spectrum is proportional to the input spectrum and the
error ratio is constant for any signal component, even if the power spectrum of the
signal is not uniform (Doi and Lewicki, 2005). This may be interpreted as allocating
the representational resources evenly over the signal components even if their signal
strengths are different. The suboptimality of whitening in the MMSE sense has been
observed in the literature (Bethge, 2006; Doi et al., 2007; Eichhorn et al., 2009), and
our analysis provides an explanation as to how that is the case.
4Conclusion
We investigated how to optimally organize a population of noisy neurons in order to represent
noisy input signals. Our analysis provides an exact solution and its characterization under
an idealized setting, which is applicable in a wide range of conditions. For example, the
degradation of input signals can model optical blur and additive noise in image formation,
and could be set from measurements in a specific system. Similarly, the neural (channel)
noise and population size can also be determined for the system of interest. This provides
a way to predict the optimal code for a wide variety of biological systems. The application
need not be restricted to vision, or even to biological systems, and could be used to design
optimal signal processors.
Acknowledgment
We would like to thank the reviewer for pointing out the prior study on the decomposition
in a general case (Wolf and Ziv, 1970).
13
Page 14
01KLKLK’K
KKKK’
2
??
??
Input signal
Frequency
Power
012
Encoder
Frequency
Power
012
Noisy representation
Error ratio
Frequency
Power
012
Decoder
Frequency
Power
012
??
??
0
Frequency
Power
Reconstruction
1100
0
0.2
???
0.6
0.8
1
Frequency
??
???
??
??
??
0
2
encoder
output
noise
noisy
representation
input
reconstruction
error
1
2
3
(a)
(d)
(b)
(e)
(c)
(f)
Figure 2: Spectral analysis of robust coding. (a) Input signal y (see Figure1 for notation).
In this example, its spectrum λ∗
iis given by 1/f2as in photographic images of natural
scenes (i.e., λ∗
i= 1/i2by equating the index i with the frequency f). The input and output
dimensions are N = 100, and M = 100, respectively (hence, a complete representation),
and the variance of channel noise, σ2
δ, is set so that the average SNR of the channel
P
Mσ2
δ
=
1. Two points on the horizontal axis indicates the critical frequency, K (closed circle),
and the maximizer of the encoder spectrum, L (open circle), respectively (see text). (b)
Encoder Wrobust. We show the encoder spectra with M = 100 (black curve) and also with
M = 200 (gray curve; this is a 2× overcomplete representation), with the corresponding
critical frequency, K and K?, respectively. Maximizer L is shown only for the complete
representation. (c) Noisy representation r. The encoder output Wrobusts and channel noise
δ are also shown. (d) Decoder A. (e) Reconstruction ˆ y. This is given by the multiplication of
noisy representation (c) and decoder (d). (f) The error ratio for each frequency, as defined in
eq.34. With the 1/f2power spectrum of input signal, the ratio before thresholding (eq.34a)
is given by√l0f (note this plot is in the linear axes to clarify the linearity). As in (b), we
illustrate two cases: complete (black) and 2× overcomplete (gray) representations.
14
Page 15
References
Atick, J. J. (1992). Could information theory provide an ecological theory of sensory pro-
cessing? Network, 3:213–251.
Atick, J. J., Li, Z., and Redlich, A. N. (1990). Color coding and its interaction with spa-
tiotemporal processing in the retina. Technical Report IASSNS-HEP-90/75, Institute for
Advanced Study.
Atick, J. J. and Redlich, A. N. (1990). Towards a theory of early visual processing. Neural
Computation, 2:308–320.
Atick, J. J. and Redlich, A. N. (1992). What does the retina know about natural scenes?
Neural Computation, 4:196–210.
Bethge, M. (2006). Factorial coding of natural images: how effective are linear models in
removing higher-order dependencies? J. Opt. Soc. Am. A, 23(6):1253–1268.
Chu, M. T. and Golub, G. H. (2005). Inverse Eigenvalue problems. Oxford University Press.
Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. John Wiley &
Sons, New York, 2nd edition.
Dayan, P. and Abbott, L. F. (2001). Theoretical Neuroscience: Computational and Mathe-
matical Modeling of Neural Systems. The MIT Press, London.
Dobrushin, R. L. and Tsybakov, B. S. (1962). Information transmission with additional
noise. IRE Transactions on Information Theory, 8:293–304.
Doi, E., Balcan, D. C., and Lewicki, M. S. (2007). Robust coding over noisy overcomplete
channels. IEEE Transactions on Image Processing, 16:442–452.
15
View other sources
Hide other sources
-
Available from Eizaburo Doi · 27 Nov 2012
-
Available from nyu.edu
-
Available from nyu.edu
-
Available from nyu.edu