Conference PaperPDF Available

Least Squares Equalizer Design under Consideration of Tail Effects

Authors:

Figures

Content may be subject to copyright.
Least Squares Equalizer Design under Consideration of Tail Effects
Stefan Goetze1, Markus Kallinger2, Alfred Mertins3, and Karl-Dirk Kammeyer1
1University of Bremen, Dept. of Communications Engineering, D-28334 Bremen, Email: goetze@uni-bremen.de
2Fraunhofer Institute for Integrated Circuits, D-91058 Erlangen, Email: markus.kallinger@iis.fraunhofer.de
3University of L¨
ubeck, Institute for Signal Processing, D-23538 L¨
ubeck, Email: alfred.mertins@isip.uni-luebeck.de
Abstract
Modern high-quality hands-free telecommunication sy-
stems have to cope with several real-world problems,
such as corruption of the desired signal by additive noise,
acoustic echoes and reverberation. This paper addresses
the mutual impacts of the subsystems for Acoustic Echo
Cancellation and Listening Room Compensation (LRC).
In acoustic systems for LRC the equalizer is placed in
front of the loudspeaker. An estimate of the room im-
pulse response (RIR) is necessary for the equalizer to
compensate for the influence of the RIR at the positi-
on of the reference microphone where the human user
is located. Since the RIR is identified by the acoustic
echo canceller (AEC) anyway, its estimate can be used
to design the equalizer. The quality of equalization in
dependence of the degree of system identification will be
investigated in this contribution. Furthermore the influ-
ence of the equalizer on an echo canceller is analyzed.
Listening Room Compensation
Figure 1shows the basic setup for an LRC filter cEQ pre-
ceding the RIR hwhose influence has to be compensated.
-
+
eEQ[k]
cEQ
d
h
s[k]x[k]y[k]
ˆy[k]
near end room
Figure 1: Least-squares equalizer for Listening Room Com-
pensation.
By minimizing the mean square error of
eEQ[k] = sT[k]HcEQ sT[k]d(1)
with the definitions
s[k] = [ s[k], s[k1], ... , s[kLhLc,EQ + 2] ]T(2)
cEQ =cEQ,0, cEQ,1, ... , cEQ,Lc,EQ1T(3)
d= [ 0, ..., 0
|{z }
k0
, d[0], d[1], ..., d[Ld1],0, ..., 0
| {z }
Lh+LcEQ
1Ldk0
]T(4)
and the convolution matrix Hof dimension (Lh+Lc,EQ
1×Lh) we get the well known least squares equalizer
cEQ =H+d(5)
for a white noise input s[k]. In (5)H+denotes the Moore-
Penrose pseudoinverse of the channel matrix and dis
the desired system which should be approximated by the
concatenated system cEQ H. Here dis chosen as a 10th
order butterworth bandpass with band limits at 200Hz
and 3700Hz for a sampling frequency of fs= 8kHz. The
lengths of the RIR, the LRC filter and the desired system
dare denoted by Lh,Lc,EQ and Ld, respectively.
System Identification by an Acoustic Echo
Canceller
For LRC an estimate of the RIR in eq. (5) is needed
which can be delivered by the AEC since the estimate of
the echo ˆ
ψ[k] is obtained by system identification anyway.
+-
cEQ[k]
cAEC[k]
h[k]
s[k]x[k]
ˆ
ψ[k]
ψ[k]eAEC[k]
near end room
4000
300020001000
h[k]
k
1
0.5
0
0
LcLt
Figure 2: System for Listening Room Compensation with an
Acoustic Echo Canceller for system identification.
Since the length Lhof the RIR which has to be identified
is greater than the length Lcof the identification filter,
the system identification will be biased for a nonwhite
input x[k]. This is known from echo cancellation as the
tail effect [1]. We split up the RIR into a part hc[k] which
can be modeled by the AEC and a tail ht[k] according
to Figure 2. By minimizing the power of the AEC error
Ee2
AEC[k]with the error signal
eAEC[k] = hT
c[k]xc[k]cT
AEC[k]xc[k] + hT
t[k]xt[k] (6)
and the signal- and coefficient-vectors
xc[k] = [x[k], x[k1] , ... , x[kLc+ 1]]T(7)
xt[k] = [x[kLc], ... , x[kLcLt+ 1]]T(8)
hc[k] = [h0[k], h1[k], ... , hLc1[k]]T(9)
ht[k] = [hLc[k], hLc+1[k], ... , hLh1[k]]T(10)
cAEC[k] = [cAEC,0[k], cAEC,1[k], ..., cAEC,Lc1[k]]T(11)
we obtain
cAEC[k] = hc[k] + Exc[k]xT
c[k]1Exc[k]xT
t[k]ht[k].
(12)
From equation (12) we see that the exact identification
of the RIR is only possible for a white input signal since
Exc[k]xT
t[k]is zero only for a white input. The more
the early part of the input signal xc[k] is correlated to
the late part of the input signal xt[k] the stronger the
influence of the tail ht[k] is. As we can see from Figure 2
further correlation is caused by the equalizer in the input
path of the AEC.
Simulation Results
The RIR was simulated with a reverberation time of
τ60 = 300 ms. The filter orders of the AEC and the
equalizer (EQ) were 1024 and 2048, respectively. As in-
put signals white Gaussian noise and a recorded speech
signal (male speaker) were used.
AEC convergence
The convergence of the AEC is influenced by the addi-
tional coloration introduced by the EQ.
DdB [k]
k
-30
-25
-20
-15
-10
-5
2.5
2
1.5
1
0.5
0
0
·104
noise input, EQ off
noise input, EQ on
speech input, EQ off
speech input, EQ on
Figure 3: Relative System Misalignment DdB[k]
Figure 3shows the relative system misalignment
DdB[k] = 10 ·log10
||h[k]cAEC[k]||2
||h[k]||2(13)
with the quadratic vector norm ||h[k]||2=hT[k]h[k] for
the two input signals s[k] (white noise or speech) and for
the cases of active and inactive EQ. If the EQ is switched
off and the system input is white the AEC reaches the
best system identification and the fastest convergence. If
we switch on the EQ, both convergence and maximum
system identification decrease. This is due to the correla-
tion introduced by the EQ filter as we can see from (12).
The same tendency can be observed for speech input.
Influence of the AEC on the EQ
For evaluation of the LRC subsystem we use the er-
ror criterion after [2]. The spectrum of the concaten-
ated system of cEQ [k] and h[k] can be calculated by
E[m] = H[m]·CEQ[m] with H[m] and CEQ [m] being
the frequency-discrete room transfer function (RTF) and
the LRC-filter, respectively. The variance of E[m] gives a
measure for the spectral flatness of the equalized system
and thus for the quality of equalization:
σ2
E=1
mmax mmin
mmax
X
m=mmin
20 ·log10|E[m]| ¯
EdB2.
Here the mean value of the logarithmic spectrum is given
by ¯
EdB = 1/(mmax mmin)Pmmax
m=mmin 20 ·log10|E[m]|.
The limits mmin and mmax are chosen to match the
Discrete Fourier Transform (DFT) bins at 200Hz and
3700Hz respectively because this is our desired equali-
zation area specified by the reference system d.
Figure 4shows the variance σ2
Ein dependance of the sy-
stem misalignment of the AEC for the white noise and
the speech input signal. It should be mentioned that the
x-axis is flipped so that high values for DdB , which indi-
cate bad convergence, are left and smaller values indica-
ting good convergence are right. The two horizontal lines
at σ2
E= 1.05 and σ2
E= 14.34 indicate the least-squares
equalization with an ideally known impulse response and
the unequalized case (EQ switched off) respectively.
noequalization
LSequalizerwitha-prioriinformation
-14
-12
-10-8
-6
-4
20
10
0
white noise
speech
σ2
E(DdB )
DdB
Figure 4: Variance σ2
Eof the equalized system depending on
the degree of system identification
The system shows the same behavior for both white noise
and speech input. If the AEC shows poor convergence,
which means that the system misalignment is high, the
EQ introduces further distortion to the loudspeaker si-
gnal and should better be switched off in such periods.
The better the system identification is the more the va-
riance decreases which indicates a good equalization.
Conclusion
In this contribution we analyzed the mutual influences of
the video-conferencing subsystems Listening Room Com-
pensation and Acoustic Echo Cancellation. The quality
of system identification and thus of echo reduction was
shown in dependance of the coloration introduced by the
equalizer. Furthermore the quality of equalization was
analyzed in dependance of the degree of system identifi-
cation. Using these results it is possible to influence the
adaptation of one of the subsystems by analyzing the
other to archive a better overall performance.
Literatur
[1] J. Benesty, D. R. Morgan, and M. M. Sondhi. A
Better Understanding and an Improved Solution to
the Specific Problems of Stereophonic Acoustic Echo
Cancellation. IEEE Trans. on Speech and Audio Pro-
cessing, 6(2):156–165, Mar 1998.
[2] J. N. Mourjopoulos. Digital Equalization of Room
Acoustics. Journal of the Audio Engineering Society,
42(11):884–900, November 1994.
... Section 4.5) which converges quickly and is computationally efficient. Chapter 5 discusses different possibilities for combinations of subsystems for AEC and LRC and the respective mutual influences of these subsystems [GKMK06a,GKMK07]. Main contributions in this chapter are the system identification and the influences on the LRC approaches 1 Introduction [GKMK08c,GKMK08d] and the identification of equalized impulse responses [GXJ + 11], as well as a method to increase LRC robustness based on the knowledge of the AEC convergence state [GKMK08b] (cf. ...
... The unmodelled tail of the RIR, which is depicted in Figure 3.1 in gray for a filter order of 1024 exemplarily, always leads to a residual echo at the output of the AEC. For correlated input signals this so-called tail-effect of acoustic echo cancellation, furthermore, leads to a biased system identification which gets more severe for multiloudspeaker hands-free systems [BMS98b,Kal07,GKMK07]. ...
... The nature of the tail will of course have influence on the typical estimation error and, thus, have influence on the LRC filter error. Secondly, even the RIR estimate of the first L AEC filter coefficients will be biased due to the influence of the unmodelled tail [BMS98b,Kal07,GKMK07] With the definition of the AEC system misalignment vector in (3.1.1) the RIR can be split up in two parts (cf. ...
... Here, L c,AEC is the length of the AEC filter which equals L ˜ h and is, in general, smaller than the length of the RIR L h . Thus, the so-called tail of the RIR which cannot be identified by the AEC always contributes to the estimation error˜herror˜ error˜h[k] and leads to a decreased performance of the equalizer [13]. ...
Conference Paper
Full-text available
Dereverberation of speech signals in a hands-free scenario by inverse filtering has been a research topic for several years now. However, it is still a challenging problem because of the nature of common room impulse responses (RIRs), which are time-variant mixed phase systems having a large number of zeros close to, on, and even outside the unit circle in the z-domain. In this contribution an adaptive multi-channel equalization algorithm based on a decoupled version of the modified filtered-X LMS (mFxLMS) will be derived in the partitioned frequency domain. This new algorithm allows for fast convergence, computationally efficient implementation, and a low system delay under realistic conditions such as ambient noise and imperfect RIR estimates.
... 4(d) that sparse IRs can be achieved by equalization and, thus, the application of proportionate filter update schemed may be advantageous.The proportionate normalized least-mean-squares (PNLMS) algorithm[10,26]differs from the NLMS algorithm by the fact that the available adaptation energy is distributed unevenly over all filter coefficients, i.e. each coefficient is updated with an adaptation gain proportional to its own magnitude. The underlying idea was originally presented in[27]based on the assumption that typical RIRs decay exponentially. Since in practice, the real magnitude of each coefficient is not known in advance for arbitrary IRs, the current LRC filter coefficients will be used in the PNLMS approach instead. ...
Conference Paper
Full-text available
Hands-free telecommunication systems usually employ subsystems for acoustic echo cancellation (AEC), listening-room compensation (LRC) and noise reduction in combination. This contribution discusses a combined system of a two-stage AEC filter and an LRC filter to remove reverberation introduced by the listening room. An inner AEC is used to achieve initial echo reduction and to perform system identification needed for the LRC filter. An additional outer AEC is used to further reduce the acoustic echoes. The performance of proportionate filter update schemes such as the so-called proportionate normalized least mean squares algorithm (PNLMS) or the improved PNLMS (IPNLMS) for system identification of equalized impulse response (IR) are shown and the mutual influences of the subsystems are analyzed. If the LRC filter succeeds in shaping a sparse overall IR for the concatenated system of LRC filter and room impulse response (RIR), the PNLMS performs best since it is optimized for the identification of sparse IRs. However, the equalization may be imperfect due to channel estimation errors in periods of convergence and due to the so-called tail-effect of AEC, i.e. the fact that only the first part of an RIR is identified in practical systems. The IPNLMS is more appropriate in this case to identify the equalized IR.
Article
Full-text available
Signal-processing methods such as digital equalization can in theory achieve a reduction in acoustic reverberation. In practice, however, the realization of these methods is only partially successful for a number of objective and subjective (perceptual) reasons. Two of these problems, the dependence of the equalizer performance on the source and receiver positions and the requirement for extremely lengthy filters, are addressed. It is proposed that all-pole modeling of room responses can relax the equalizer filter length requirement, and the use of vector quantization can optimally classify such responses, obtained at different source and receiver positions. Such classification can be used as a spatial equalization library, achieving reduction in reverberation over a wide range of positions within an enclosure, as was confirmed by a number of tests.
Conference Paper
Teleconferencing systems employ acoustic echo cancelers (AECs) to reduce echos that result from coupling between the loudspeaker and microphone. To enhance the sound realism, two-channel audio is necessary. However, in this case (stereophonic sound) the acoustic echo cancellation problem is more difficult to solve because of the necessity to uniquely identify two acoustic paths. We explain these problems in detail and give an interesting solution which is much better than previously known solutions. The basic idea is to introduce a small nonlinearity into each channel that has the effect of reducing the interchannel coherence while not being noticeable for speech due to self masking