Conference PaperPDF Available

Room Impulse Response Reshaping by p-Norm Optimization based on Estimates of Room Impulse Responses

Authors:

Figures

Content may be subject to copyright.
Room Impulse Response Reshaping by p-Norm Optimization based on Estimates of
Room Impulse Responses
Jan Ole Jungmann1, Stefan Goetze2, Alfred Mertins1
1University of L¨ubeck, Institute for Signal Processing, D-23538 L¨ubeck, Email: {jungmann, mertins}@isip.uni-luebeck.de
2Fraunhofer Institute for Digital Media Technology, D-26129 Oldenburg, Email: s.goetze@idmt.fraunhofer.de
Abstract
Hands-free telecommunication raised several real-world
problems, such as corruption of the desired signal by ad-
ditive noise, acoustic echoes, and reverberation. This pa-
per addresses the mutual impacts of the subsystems for
Acoustic Echo Cancellation (AEC) and Listening Room
Compensation (LRC) based on p-norm optimization. In
acoustic systems for LRC the equalizer is placed in front
of the loudspeaker. An estimate of the room impulse re-
sponse (RIR) is necessary for the equalizer to compensate
for the influence of the RIR at the position of the refer-
ence microphone where the human user is assumed to be
located. Since the RIR is identified by the acoustic echo
canceller anyway, its estimate can be used to design the
equalizer. The quality of dereverberation in dependance
of the degree of system identification will be investigated
in this contribution. Furthermore, the influence of the
equalizer on the AEC is analyzed.
RIR Reshaping by p-Norm Optimization
In [3] the well-known least-squares based design rule to
build an equalizer that renders certain properties on the
global impulse response has been generalized by intro-
ducing a p-norm based objective function. Two weight-
ing windows for the desired and the unwanted part of
the global impulse response are defined. The optimiza-
tion problem reads
minh:f(h)=loggupu
gdpd,(1)
with hbeing the equalizer one aims to design,
gd=diag{wd}Ch being the desired part of the global
impulse response of length Lg(guaccordingly), Cbeing
the convolution matrix made up of the RIR c(n), ·pde-
noting the p-norm of a vector, and diag {·} transforming
a vector into a diagonal matrix. The optimal solution is
approximated by applying a gradient-descent procedure.
Design of the Weighting Windows
Mertins et. al proposed to exploit psychoacoustic findings
to reduce the audible echoes and introduced weighting
functions that capture the compromise temporal masking
limit of the human auditory system [3]. The definitions
for the windows read as follows:
wd=[0,0,...,0
 
N1
,1,1,...,1
 
N2
,0,0,...,0
 
N3
]T(2)
and
wu=[0,0,...,0
 
N1+N2
,wT
0

N3
]T(3)
where N1=t0fs,N2=0.004sfs,andN3=LgN1N2
with fsbeing the sampling frequency and t0being the
time taken by the direct sound, respectively. The window
w0, with its reciprocal being the compromise masking
limit according to [1], is defined as
w0(n)=10 3
log(N0/(N1+N2)) logn
N1+N2+0.5(4)
with N0=(0.2s + t0)fsand time index nranging from
N1+N2+1toLg1.
Spectral Distortions
To attenuate an additional coloration of the loudspeaker
signal, we propose to weaken potential spectral distor-
tions introduced by the equalizer by applying a short
linear prediction error filter that is specifically designed
for the equalizer.
System Identification
Since the AEC is an adaptive filter which identifies the
room impulse response, its estimate can be used for the
equalizer design. Figure 1 depicts the general setup of a
combined LRC/AEC scenario.
s(n)EQ RIR +e(n)
AEC
Figure 1: Combined system with LRC (EQ) and acoustic
echo canceller (AEC); RIR denotes the listening room, con-
taining the speaker and the microphone.
The RIR can be split up into one part ˆ
c(n) that is cor-
rectly identified by the AEC and an estimation error
˜
c(n), due to underestimating the order of the RIR and
insufficient convergence of the AEC:
c(n)=ˆ
c(n)+˜
c(n).(5)
DAGA 2011 - Düsseldorf
611
Simulation Results
The RIR was simulated having a reverberation time of
τ60 = 200 ms and a length of Lc= 2000 taps. The filter
orders of the AEC and the equalizer h(k) were set to 1800
and 2000, respectively. As input signals Gaussian white
noise and a recorded speech signal (female speaker) were
used. The equalizer was redesigned by optimizing the p-
norm based objective function (equation (1)) every 1000
updates of the AEC with the weighting windows defined
in (2) and (3); further we set pd=10andpu= 20.
The length of the prediction error filter has been set to
Lp=50taps.
AEC Convergence
The convergence of the AEC is influenced by the addi-
tional coloration introduced by the equalizer.
0.5 1 1.5 2 2.5 3 3.5
x 1
0
5
−30
−25
−20
−15
−10
−5
0
n
DdB(n)
Noise input, EQ off
Noise input, EQ on
Speech input, EQ off
Speech input, EQ on
Figure 2: Relative System Misalignment DdB (n).
Figure 2 shows the relative system misalignment
DdB (n)=10·log10
c(n)ˆ
c(n)2
c(n)2(6)
with the quadratic vector norm c(n)2=c(n)Tc(n)
for the two input signals s(n) (noise or speech) and for
the cases of active and inactive equalizer. In the case
of noise input, the use of the equalizer only results in a
slightly lower converge rate and slightly decreased overall
system identification. By choosing a speech signal as an
input signal the use of the equalizer results in a decrease
of both the convergence rate and the system identifica-
tion performance.
Influence of the AEC on the EQ
For evaluation of the LRC subsystem we use a revised
version of the reverberation quantization (RQ) mea-
sure [2]. The proposed measure captures the audible re-
verberation by integrating the impulse response’s energy
that exceeds the compromise temporal masking limit on
a logarithmic scale:
RQr=Lg1
n=N0gEM (n)(7)
with
gEM (n)=20 ·log10 (|g(n)wu(n)) ,|g(n)|>1
wu(n)
0,otherwise
(8)
and N0being the discrete time index that is 4 ms later
than the direct impulse of g(n).
If the RIR is completely reshaped, then no time coeffi-
cient exceeds the temporal masking limit and RQr=0.
Figure3showstheRQ
rvalue in dependance of the sys-
tem misalignment of the AEC for the noise and the
speech input signal. It should be mentioned that the
x-axis is flipped so that high values for DdB,whichin-
dicate bad convergence, are left and smaller values indi-
cating good convergence are right. The dashed horizontal
line at RQr= 5218 indicates the unequalized RIR. When
the equalizer has been designed with perfect knowledge
of the RIR one reaches RQr=0.96.
0 −2 −4 −6 −8 −10 −12 −14 −16 −18 −20
2000
4000
6000
8000
Relative System Misalignment DdB
RQr(DdB)
Noise Input
Speech Input
Unreshaped RIR
Figure 3: RQrvalue of the equalized system depending on
the degree of system identification measured by DdB.
Generally, it can be seen that a high system misalignment
results in bad dereverberation performance. By choos-
ing speech as the input signal, dereverberation can be
achieved when the misalignment of the system estimate
is below 5.06 dB. With the noise input signal, the mis-
alignment must be lower than 12.1 dB to reduce the
audible reverberations.
Conclusions
In this contribution we analyzed the mutual influences of
the video-conferencing subsystems listening-room com-
pensation by utilizing the p-norm optimization and
acoustic echo cancellation. The quality of the system
identification was shown in dependance of the additional
coloration of the loudspeaker signal introduced by the
equalizer. Furthermore the performance of the audible
echo reduction was analyzed in dependance of the degree
of system identification.
References
[1] L. D. Fielder. Analysis of traditional and reverbera-
tionreducing methods for room equalization. J. Audio
Eng. Soc., 51:3–26, 2003.
[2] T. Mei and A. Mertins. On the robustness of
room impulse response reshaping. In Proc. Interna-
tional Workshop on Acoustic Echo and Noise control
(IWAENC), Tel Aviv, Israel, Aug. 2010.
[3] A. Mertins, T. Mei, and M. Kallinger. Room im-
pulse response shortening/reshaping with infinity-
and p-norm optimization. IEEE Transactions on Au-
dio, Speech, and Language Processing, 18(2):249–259,
2010.
DAGA 2011 - Düsseldorf
612
... As evaluation criterion for the quality of the PAF, we use the mean normalized system misalignment [17,18] with hu ∈ R L containing the true RIR andĥu ∈ R L being the reconstructed RIR at grid index u. ...
... As evaluation criterion for the quality of the PAF, we use the mean normalized system misalignment [17,18] with hu ∈ R L containing the true RIR andĥu ∈ R L being the reconstructed RIR at grid index u. ...
Article
Full-text available
The sampling of sound fields involves the measurement of spatially dependent room impulse responses, where the Nyquist-Shannon sampling theorem applies in both the temporal and spatial domain. Therefore, sampling inside a volume of interest requires a huge number of sampling points in space, which comes along with further difficulties such as exact microphone positioning and calibration of multiple microphones. In this paper, we present a method for measuring sound fields using moving microphones whose trajectories are known to the algorithm. At that, the number of microphones is customizable by trading measurement effort against sampling time. Through spatial interpolation of the dynamic measurements, a system of linear equations is set up which allows for the reconstruction of the entire sound field inside the volume of interest.
Article
Full-text available
In room impulse response (RIR) equalization and reshaping, one of the difficulties is the spatial robustness, because RIRs are very sensitive to the movements of both the signal source and the receiver. For example, the reshaping filter designed for one pair of loudspeaker/receiver or source/microphone positions will be ineffective for another pair. In this paper, we concentrate on loudspreaker/receiver pairs and propose a novel approach in which we use multiple prefilters to reshape simultaneously the RIR samples in a given area of interest (listening area). According to the RIR sampling principle, we prove statistically that the listening area will be reshaped if only the RIR samples in this area are reshaped. In simulations, we show that the proposed approach is valid.
Article
Full-text available
The purpose of room impulse response (RIR) shortening and reshaping is usually to improve the intelligibility of the received signal by prefiltering the source signal before it is played with a loudspeaker in a closed room. In an alternative, but mathematically equivalent setting, one may aim to postfilter a recorded microphone signal to remove audible echoes. While least-squares methods have mainly been used for the design of shortening/reshaping filters for RIRs until now, we propose to use the infinity- or p -norm as optimization criteria. In our method, design errors will be uniformly distributed over the entire temporal range of the shortened/reshaped global impulse response. In addition, the psychoacoustic property of masking effects is considered during the filter design, which makes it possible to significantly reduce the filter length, compared to standard approaches, without affecting the perceived performance.
Article
Traditionally, electronic equalization has used linear filters of low complexity. The nature of spectral and temporal distortions of rooms limits useful equalization to minimum-phase filters of relatively low order, despite the existence of new and powerful digital signal processing tools. The high Q and non-minimum-phase nature of the room-loud-speaker-listener transfer function, caused by wave interference effects, creates severe problems for more complete equalization. A typical professional listening room and three cinema acoustic environments were used to investigate the difficulties inherent for more ambitious equalization approaches.