Content uploaded by Stefan Goetze
Author content
All content in this area was uploaded by Stefan Goetze
Content may be subject to copyright.
Room Impulse Response Shaping based on Estimates of Room Impulse Responses
Stefan Goetze1, Markus Kallinger2, Alfred Mertins3, and Karl-Dirk Kammeyer1
1University of Bremen, Dept. of Communications Engineering, D-28334 Bremen, Email: goetze@uni-bremen.de
2University of Oldenburg, Signal Processing Group, D-26111 Oldenburg, Email: markus.kallinger@uni-oldenburg.de
3University of L¨
ubeck, Institute for Signal Processing, D-23538 L¨
ubeck, Email: alfred.mertins@isip.uni-luebeck.de
Abstract
Modern hands-free telecommunication systems have to
reduce different acoustic disturbances of the desired
speech signal, amongst them echoes of the far-end spea-
ker due to the acoustic coupling between loudspeaker
and microphone and room reverberation for the near-
end listener caused by reflections at the room bounda-
ries. Common systems for listening-room compensation
(LRC) try to design an equalization filter which is the
inverse of the room impulse response to achieve spectral
flatness of the concatenated overall system of equalizer
and room transfer function. Such designs need reliable
knowledge of the room impulse response (RIR) which is
not available in real systems. Furthermore, it could be
shown that shaping approaches which preserve the mas-
king effects of room impulse responses lead to perceptual-
ly better results especially in case of estimation errors. In
this contribution a system for room impulse response sha-
ping is analyzed depending on the influences of realistic
RIR estimates, which are obtained by an acoustic echo
canceller (AEC).
Listening Room Compensation
In an acoustic scenario for listening-room compensation
(LRC) the equalizer cEQ[k] precedes the acoustic channel
- the room impulse response (RIR) h[k] - as depicted in
Figure 1. Its goal is to reduce the influence of the RIR at
the position of the reference microphone where the near-
end user of the telecommunication system is assumed to
be located. A straightforward inversion of a RIR by a
stable, causal infinite impulse response (IIR) filter is not
possible, in general, since RIRs have hundreds of zeros in-
side and outside close to the unit circle in the z-domain
[4]. Thus a common approach is the least-squares equali-
zer (LS-EQ) [3] which minimizes the error signal eEQ[k]
and by this the distance between the overall system of
cEQ[k]convolvedwithh[k] and a desired target system
d[k]. Least-squares equalizers need very accurate estima-
tes of the RIR which may not always be available, since
a common RIR is time variant, e.g. due to possible spea-
ker movements, and may have a length of thousands of
taps. Thus, the RIR has to be identified by an appro-
priate adaptive filter and, especially in periods of initial
convergence or after RIR changes, the system identifica-
tion may be insufficient for a good design of the equalizer
[1].
The goal of RIR shaping approaches [2]istoshortenthe
RIR to maximize the energy in its first 50ms since the
early reflections in the first 50ms are known to enhance
speech intelligibility while the reverberant tail of the RIR
decreases speech intelligibility. To avoid late echoes oc-
curring in the equalized system RIR shaping which aims
in reducing the room reverberation time τ60 is a promi-
sing approach [2].
All methods for LRC need knowledge about the RIR
which has to be measured or estimated adaptively and
thus may by deficient. In hands-free systems acoustic
echo cancelers (AECs) are common systems to reduce
echoes for the far-end listener. This is done by identify-
ing the RIRs.
System Identification
Since the AEC is an adaptive filter which identifies the
room impulse response (RIR) its estimate can be used
for the EQ design.
+
-
eAEC[k]
eEQ[k]
cEQ[k]
cAEC[k]
cAEC[k]
˜
h[k]
d[k]
sf[k]
y[k]
ˆy[k]
x[k]
ψ[k]
ˆ
ψ[k]
near-end room h[k]
Figure 1: Combined system with LRC filter cEQ [k] and acou-
stic echo canceller cAEC [k]. The RIR can be split into a part
modeled by the AEC cAEC[k] and an estimation error ˜
h[k].
As depicted in Figure 1the RIR h[k] can be split up into
one part ˆ
h[k]=cAEC[k] which is correctly identified by
theAECandanestimationerror˜
h[k]:
h[k]=ˆ
h[k]+˜
h[k]=cAEC[k]+˜
h[k](1)
Estimation errors ˜
h[k] mainly have two reasons: 1) the
system identification is incomplete due to unfinished filter
convergence and/or a too short filter to identify the RIR
on its full length, 2) the equalizer is designed for the
spatial position of the reference microphone and the user
is not located at this position.
Simulation Results
The influences of the two possible errors described before
on the performance of the equalizer will be analyzed in
the following.
DAGA 2008 - Dresden
829
Insufficient AEC convergence: Figure 2compares the
conventional least-squares EQ with the RIR shaping ap-
proach according to [2] by means of the Bark spectral
distortion (BSD) measure [5] which is common to eva-
luate dereverberation algorithms. The BSD is shown for
different states of convergence of the AEC filter which
delivers the RIR estimate.
−20 −15 −10 −5 0
0
0.5
1
1.5
BSD
relative AEC system misalignment
Least−squares EQ
RIR Shaping
Figure 2: Bark spectral distortion (BSD) depending on the
normalized AEC convergence state D=||˜
h[k]||2/||h[k]||2.
It can be seen from Figure 2that the EQ performance is
slightly better for the LS design for a good RIR estimate
(D<−7dB). However, in practical situations the AEC
has to follow the time variant RIR and thus its conver-
gence state may be worse than -7dB for most of the time.
In this region the LS-EQ may lead to severe signal dis-
tortions as it can be seen from the steep rise of the BSD
while the RIR shaping approach shows considerably bet-
ter performance.
0 500 1000 1500
−30
−25
−20
−15
−10
−5
0
Discrete sample index k
EDC
Original RIR
−0.4dB
−4dB
−7dB
−10dB
−10dB
−0.4dB
Figure 3: Energy decay curve (EDC) depending on the AEC
convergence state D=||˜
h[k]||2/||h[k]||2.
Figure 3shows the performance of the RIR shaping ap-
proach for different states of AEC convergence. The thick
solid black line shows the energy decay curve (EDC) of
the original RIR and the other lines the EDCs of the
equalized systems for different AEC convergence states.
It can be seen that a bad AEC convergence state de-
creases the performance of the EQ but that it is robust
up to D=−4dB and in combination with Figure 2it
can be seen that even for a very bad AEC convergence
the distortions introduced to the desired signal are by
far less than those for the LS-EQ. Thus, it can be stated
that the RIR shaping approach according to [2]ismore
robust with regard to RIR estimation errors.
Spatial mismatch: Figure 4shows the robustness
against spatial mismatch with respect to the BSD measu-
re.AnEQisdesignedforafixedposition(x=2.6 meters
and y=2.1 meters from the room corner). This point is
located in the centers of the left and right subplot. Then
the EQ is applied to spatially differing positions (10 cm
in every direction). It is clear that the LS-EQ performs
best at the exact position it is designed for (center of left
plot) but the performance heavily degrades for a spatial
mismatch. In contrast to that the RIR shaping approach
according to [2] (right subplot) is spatially robust which
is a very important property since the user of the hands-
free system will be located at some spatial distance from
the reference microphone.
2.5 2.55 2.6 2.65 2.7
2
2.05
2.1
2.15
2.2
x in meter
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2.5 2.55 2.6 2.65 2.7
2
2.05
2.1
2.15
2.2
x in meter
yinmeter
Figure 4: BSD measure for spatial mismatch for LS-EQ (left)
and room impulse response shaping (right).
Conclusion
In this contribution listening-room compensation approa-
ches are evaluated for the case of imperfect system iden-
tification due to incomplete RIR estimation or due to
spatial mismatch. From the presented results it can be
stated that room impulse-response shaping approaches
are very promising for real-world hands-free systems be-
cause they perform well even for imperfect knowledge
about the room impulse response, which often is assu-
med to be perfectly known in literature but, in general,
has to be estimated.
Literatur
[1] S. Goetze, M. Kallinger, A. Mertins, and K.-D. Kammeyer. Sy-
stem Identification for Multi-Channel Listening-Room Compensa-
tion using an Acoustic Echo Canceller. In Workshop on Hands-free
Speech Communication and Microphone Arrays (HSCMA),Tren-
to, Italy, May 2008.
[2] M. Kallinger and A. Mertins. Room Impulse Response Shortening
by Channel Shortening Concepts. In Proc. Asilomar Conference
on Signals, Systems, and Computers, Pacific Grove, CA , USA,
pages 209–212, Oct. 30 - Nov. 2 2005.
[3] J. N. Mourjop oulos, P. M. Clarkson, and J.K. Hammond. A Com-
parative Study of Least-Quares and Homomorphic Techniques for
the Inversion of Mixed Phase Signals. In Proc. IEEE Int. Conf. on
Acoustics, Speech and Signal Processing (ICASSP), pages 1858–
1861, 1982.
[4] S. T. Neely and J. B. Allen. Invertibility of a Room Impulse Re-
sponse. Journal of the Acoustical Society of America (JASA),
66:165–169, July 1979.
[5] S. Wang, A. Sekey, and A. Gersho. An Ob jective Measure for Pre-
dicting Subjective Quality of Speech Coders. IEEE J. Selected
Areas of C ommun icat ions, 10(5):819–829, June 1992.
DAGA 2008 - Dresden
830