Conference PaperPDF Available

Room Impulse Response Shaping based on Estimates of Room Impulse Responses

Authors:

Figures

Room Impulse Response Shaping based on Estimates of Room Impulse Responses
Stefan Goetze1, Markus Kallinger2, Alfred Mertins3, and Karl-Dirk Kammeyer1
1University of Bremen, Dept. of Communications Engineering, D-28334 Bremen, Email: goetze@uni-bremen.de
2University of Oldenburg, Signal Processing Group, D-26111 Oldenburg, Email: markus.kallinger@uni-oldenburg.de
3University of L¨
ubeck, Institute for Signal Processing, D-23538 L¨
ubeck, Email: alfred.mertins@isip.uni-luebeck.de
Abstract
Modern hands-free telecommunication systems have to
reduce different acoustic disturbances of the desired
speech signal, amongst them echoes of the far-end spea-
ker due to the acoustic coupling between loudspeaker
and microphone and room reverberation for the near-
end listener caused by reflections at the room bounda-
ries. Common systems for listening-room compensation
(LRC) try to design an equalization filter which is the
inverse of the room impulse response to achieve spectral
flatness of the concatenated overall system of equalizer
and room transfer function. Such designs need reliable
knowledge of the room impulse response (RIR) which is
not available in real systems. Furthermore, it could be
shown that shaping approaches which preserve the mas-
king effects of room impulse responses lead to perceptual-
ly better results especially in case of estimation errors. In
this contribution a system for room impulse response sha-
ping is analyzed depending on the influences of realistic
RIR estimates, which are obtained by an acoustic echo
canceller (AEC).
Listening Room Compensation
In an acoustic scenario for listening-room compensation
(LRC) the equalizer cEQ[k] precedes the acoustic channel
- the room impulse response (RIR) h[k] - as depicted in
Figure 1. Its goal is to reduce the influence of the RIR at
the position of the reference microphone where the near-
end user of the telecommunication system is assumed to
be located. A straightforward inversion of a RIR by a
stable, causal infinite impulse response (IIR) filter is not
possible, in general, since RIRs have hundreds of zeros in-
side and outside close to the unit circle in the z-domain
[4]. Thus a common approach is the least-squares equali-
zer (LS-EQ) [3] which minimizes the error signal eEQ[k]
and by this the distance between the overall system of
cEQ[k]convolvedwithh[k] and a desired target system
d[k]. Least-squares equalizers need very accurate estima-
tes of the RIR which may not always be available, since
a common RIR is time variant, e.g. due to possible spea-
ker movements, and may have a length of thousands of
taps. Thus, the RIR has to be identified by an appro-
priate adaptive filter and, especially in periods of initial
convergence or after RIR changes, the system identifica-
tion may be insufficient for a good design of the equalizer
[1].
The goal of RIR shaping approaches [2]istoshortenthe
RIR to maximize the energy in its first 50ms since the
early reflections in the first 50ms are known to enhance
speech intelligibility while the reverberant tail of the RIR
decreases speech intelligibility. To avoid late echoes oc-
curring in the equalized system RIR shaping which aims
in reducing the room reverberation time τ60 is a promi-
sing approach [2].
All methods for LRC need knowledge about the RIR
which has to be measured or estimated adaptively and
thus may by deficient. In hands-free systems acoustic
echo cancelers (AECs) are common systems to reduce
echoes for the far-end listener. This is done by identify-
ing the RIRs.
System Identification
Since the AEC is an adaptive filter which identifies the
room impulse response (RIR) its estimate can be used
for the EQ design.
+
-
eAEC[k]
eEQ[k]
cEQ[k]
cAEC[k]
cAEC[k]
˜
h[k]
d[k]
sf[k]
y[k]
ˆy[k]
x[k]
ψ[k]
ˆ
ψ[k]
near-end room h[k]
Figure 1: Combined system with LRC filter cEQ [k] and acou-
stic echo canceller cAEC [k]. The RIR can be split into a part
modeled by the AEC cAEC[k] and an estimation error ˜
h[k].
As depicted in Figure 1the RIR h[k] can be split up into
one part ˆ
h[k]=cAEC[k] which is correctly identified by
theAECandanestimationerror˜
h[k]:
h[k]=ˆ
h[k]+˜
h[k]=cAEC[k]+˜
h[k](1)
Estimation errors ˜
h[k] mainly have two reasons: 1) the
system identification is incomplete due to unfinished filter
convergence and/or a too short filter to identify the RIR
on its full length, 2) the equalizer is designed for the
spatial position of the reference microphone and the user
is not located at this position.
Simulation Results
The influences of the two possible errors described before
on the performance of the equalizer will be analyzed in
the following.
DAGA 2008 - Dresden
829
Insufficient AEC convergence: Figure 2compares the
conventional least-squares EQ with the RIR shaping ap-
proach according to [2] by means of the Bark spectral
distortion (BSD) measure [5] which is common to eva-
luate dereverberation algorithms. The BSD is shown for
different states of convergence of the AEC filter which
delivers the RIR estimate.
−20 −15 −10 −5 0
0
0.5
1
1.5
BSD
relative AEC system misalignment
Least−squares EQ
RIR Shaping
Figure 2: Bark spectral distortion (BSD) depending on the
normalized AEC convergence state D=||˜
h[k]||2/||h[k]||2.
It can be seen from Figure 2that the EQ performance is
slightly better for the LS design for a good RIR estimate
(D<7dB). However, in practical situations the AEC
has to follow the time variant RIR and thus its conver-
gence state may be worse than -7dB for most of the time.
In this region the LS-EQ may lead to severe signal dis-
tortions as it can be seen from the steep rise of the BSD
while the RIR shaping approach shows considerably bet-
ter performance.
0 500 1000 1500
−30
−25
−20
−15
−10
−5
0
Discrete sample index k
EDC
Original RIR
−0.4dB
−4dB
−7dB
−10dB
10dB
0.4dB
Figure 3: Energy decay curve (EDC) depending on the AEC
convergence state D=||˜
h[k]||2/||h[k]||2.
Figure 3shows the performance of the RIR shaping ap-
proach for different states of AEC convergence. The thick
solid black line shows the energy decay curve (EDC) of
the original RIR and the other lines the EDCs of the
equalized systems for different AEC convergence states.
It can be seen that a bad AEC convergence state de-
creases the performance of the EQ but that it is robust
up to D=4dB and in combination with Figure 2it
can be seen that even for a very bad AEC convergence
the distortions introduced to the desired signal are by
far less than those for the LS-EQ. Thus, it can be stated
that the RIR shaping approach according to [2]ismore
robust with regard to RIR estimation errors.
Spatial mismatch: Figure 4shows the robustness
against spatial mismatch with respect to the BSD measu-
re.AnEQisdesignedforafixedposition(x=2.6 meters
and y=2.1 meters from the room corner). This point is
located in the centers of the left and right subplot. Then
the EQ is applied to spatially differing positions (10 cm
in every direction). It is clear that the LS-EQ performs
best at the exact position it is designed for (center of left
plot) but the performance heavily degrades for a spatial
mismatch. In contrast to that the RIR shaping approach
according to [2] (right subplot) is spatially robust which
is a very important property since the user of the hands-
free system will be located at some spatial distance from
the reference microphone.
2.5 2.55 2.6 2.65 2.7
2
2.05
2.1
2.15
2.2
x in meter
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2.5 2.55 2.6 2.65 2.7
2
2.05
2.1
2.15
2.2
x in meter
yinmeter
Figure 4: BSD measure for spatial mismatch for LS-EQ (left)
and room impulse response shaping (right).
Conclusion
In this contribution listening-room compensation approa-
ches are evaluated for the case of imperfect system iden-
tification due to incomplete RIR estimation or due to
spatial mismatch. From the presented results it can be
stated that room impulse-response shaping approaches
are very promising for real-world hands-free systems be-
cause they perform well even for imperfect knowledge
about the room impulse response, which often is assu-
med to be perfectly known in literature but, in general,
has to be estimated.
Literatur
[1] S. Goetze, M. Kallinger, A. Mertins, and K.-D. Kammeyer. Sy-
stem Identification for Multi-Channel Listening-Room Compensa-
tion using an Acoustic Echo Canceller. In Workshop on Hands-free
Speech Communication and Microphone Arrays (HSCMA),Tren-
to, Italy, May 2008.
[2] M. Kallinger and A. Mertins. Room Impulse Response Shortening
by Channel Shortening Concepts. In Proc. Asilomar Conference
on Signals, Systems, and Computers, Pacific Grove, CA , USA,
pages 209–212, Oct. 30 - Nov. 2 2005.
[3] J. N. Mourjop oulos, P. M. Clarkson, and J.K. Hammond. A Com-
parative Study of Least-Quares and Homomorphic Techniques for
the Inversion of Mixed Phase Signals. In Proc. IEEE Int. Conf. on
Acoustics, Speech and Signal Processing (ICASSP), pages 1858–
1861, 1982.
[4] S. T. Neely and J. B. Allen. Invertibility of a Room Impulse Re-
sponse. Journal of the Acoustical Society of America (JASA),
66:165–169, July 1979.
[5] S. Wang, A. Sekey, and A. Gersho. An Ob jective Measure for Pre-
dicting Subjective Quality of Speech Coders. IEEE J. Selected
Areas of C ommun icat ions, 10(5):819–829, June 1992.
DAGA 2008 - Dresden
830
... Chapter 5 discusses different possibilities for combinations of subsystems for AEC and LRC and the respective mutual influences of these subsystems [GKMK06a,GKMK07]. Main contributions in this chapter are the system identification and the influences on the LRC approaches 1 Introduction [GKMK08c,GKMK08d] and the identification of equalized impulse responses [GXJ + 11], as well as a method to increase LRC robustness based on the knowledge of the AEC convergence state [GKMK08b] (cf. Section 5.1). ...
... For that reason the inverse filtering approach [NA79] was extended in [MK86] to a single input multiple output (SIMO) system using one loudspeaker and several reference microphones. By this, parallel equalization for spatially separated microphone positions is achieved result-ing in higher spatial robustness [GKMK08c,GKMK08b]. In [MK88] this approach is extended to multiple input single output (MISO) systems and the general case of multiple input multiple output (MIMO) systems (cf. ...
... Section 4.4). The desired system usually is a delayed impulse, band-pass or high-pass [GKMK08c] (cf. also Section 4.4.1 for a proper choice of this delay). ...
Conference Paper
One way of performing spatial audio in rooms is by using active-compensated sound field reproduction (AC-SFR) to cancel the reverberant reflections. In this approach, the sound field reproduction can be performed using pre-calculated loudspeaker filters. However the robustness can be poor due to the temperature-induced perturbation of the room impulse responses (RIR). The reverberation can be compensated using filters with significant non-causal components. These requirements on the loudspeaker filters for room equalization may be reduced by applying impulse response shaping (IRS). In this paper, we present an AC-SFR approach based on IRS that are more robust to changes in temperature.
Article
Full-text available
When a conversation takes place inside a room, the acoustic speech signal is distorted by wall reflections. The room's effect on this signal can be characterized by a room impulse response. If the impulse response happens to be minimum phase, it can easily be inverted. Synthetic room impulse responses were generated using a point image method to solve for wall reflections. A Nyquist plot was used to determine whether a given impulse response was minimum phase. Certain synthetic room impulse responses were found to be minimum phase when the initial delay was removed. A minimum phase inverse filter was successfully used to remove the effect of a room impulse response on a speech signal.
Conference Paper
Full-text available
Modern hands-free telecommunication devices jointly apply several subsystems, e.g. for noise reduction (NR), acoustic echo cancellation (AEC) and listening-room compensation (LRC). In this contribution the combination of an equalizer for listening room compensation and an acoustic echo canceller is analyzed. Inverse filtering of room impulse responses (RIRs) is a challenging task since they are, in general, mixed phase systems having hundreds of zeros inside and outside near the unit circle in the z-domain. Furthermore, a reliable estimate of the RIR which shall be inverted is important. Since RIRs are time-variant due to possible changes of the acoustic environment, they have to be identified adaptively. If an AEC (or any other adaptive method) is used to identify the time variant room impulse responses the estimate's distance to the real RIRs may be too high for a satisfying equalization, especially in periods of initial convergence of the AEC or after RIR changes. Therefore, we propose to estimate the convergence state of the AEC and to incorporate this knowledge into the equalizer design.
Article
This paper addresses the usability of channel shortening equalizers known from data transmission systems for the equalization of acoustic systems. In multicarrier systems, equalization filters are used to shorten the channel's effec-tive length to the size of a cyclic prefix or the guard inter-val. In most data-transmission applications, the equalizer succeeds the channel. In acoustic systems, an equalizer is placed in front of a playback loudspeaker to generate a de-sired impulse response for the concatenation of the equal-izer, a loudspeaker, a room impulse response, and a ref-erence microphone. In this paper, we modify the channel shortening paradigm and show that shaping the desired im-pulse response to a shorter reverberation time is more ap-propriate for acoustical systems than exactly truncating it.
Conference Paper
The inversion of minimum phase signals is well understood. However, many signals are non-minimum phase and the deconvolution of such sequences is less well documented. The aim of this paper is to compare two of the main techniques for inversion of finite length sequences, namely, homomorphic and least squares. Homomorphic methods require separation of the sequence into minimum and maximum phase components prior to inversion. On the other hand, least squares methods are applicable directly to the time sequence. However, the usual formulation applied to mixed phase signals produces an output having an all-pass form, but the use of delay yields an operator which can approximately invert mixed phase inputs. A brief summary of the two approaches is given, followed by a detailed comparison of the methods as applied to several case studies.
Article
A perceptually motivated objective measure for evaluating speech quality is presented. The measure, computed from the original and coded versions of an utterance, exhibits statistically a monotonic relationship with the mean opinion score, a widely used criterion for speech coder assessment. For each 10-ms segment of an utterance, a weighted spectral vector is computed via 15 critical band filters for telephone bandwidth speech. The overall distortion, called Bark spectral distortion (BSD), is the average squared Euclidean distance between spectral vectors of the original and coded utterances. The BSD takes into account auditory frequency warping, critical band integration, amplitude sensitivity variations with frequency, and subjective loudness