Content uploaded by Stefan Goetze
Author content
All content in this area was uploaded by Stefan Goetze on Dec 22, 2016
Content may be subject to copyright.
Room Impulse Response Reshaping by p-Norm Optimization based on Estimates of
Room Impulse Responses
Jan Ole Jungmann1, Stefan Goetze2, Alfred Mertins1
1University of L¨ubeck, Institute for Signal Processing, D-23538 L¨ubeck, Email: {jungmann, mertins}@isip.uni-luebeck.de
2Fraunhofer Institute for Digital Media Technology, D-26129 Oldenburg, Email: s.goetze@idmt.fraunhofer.de
Abstract
Hands-free telecommunication raised several real-world
problems, such as corruption of the desired signal by ad-
ditive noise, acoustic echoes, and reverberation. This pa-
per addresses the mutual impacts of the subsystems for
Acoustic Echo Cancellation (AEC) and Listening Room
Compensation (LRC) based on p-norm optimization. In
acoustic systems for LRC the equalizer is placed in front
of the loudspeaker. An estimate of the room impulse re-
sponse (RIR) is necessary for the equalizer to compensate
for the influence of the RIR at the position of the refer-
ence microphone where the human user is assumed to be
located. Since the RIR is identified by the acoustic echo
canceller anyway, its estimate can be used to design the
equalizer. The quality of dereverberation in dependance
of the degree of system identification will be investigated
in this contribution. Furthermore, the influence of the
equalizer on the AEC is analyzed.
RIR Reshaping by p-Norm Optimization
In [3] the well-known least-squares based design rule to
build an equalizer that renders certain properties on the
global impulse response has been generalized by intro-
ducing a p-norm based objective function. Two weight-
ing windows for the desired and the unwanted part of
the global impulse response are defined. The optimiza-
tion problem reads
minh:f(h)=loggupu
gdpd,(1)
with hbeing the equalizer one aims to design,
gd=diag{wd}Ch being the desired part of the global
impulse response of length Lg(guaccordingly), Cbeing
the convolution matrix made up of the RIR c(n), ·pde-
noting the p-norm of a vector, and diag {·} transforming
a vector into a diagonal matrix. The optimal solution is
approximated by applying a gradient-descent procedure.
Design of the Weighting Windows
Mertins et. al proposed to exploit psychoacoustic findings
to reduce the audible echoes and introduced weighting
functions that capture the compromise temporal masking
limit of the human auditory system [3]. The definitions
for the windows read as follows:
wd=[0,0,...,0
N1
,1,1,...,1
N2
,0,0,...,0
N3
]T(2)
and
wu=[0,0,...,0
N1+N2
,wT
0
N3
]T(3)
where N1=t0fs,N2=0.004sfs,andN3=Lg−N1−N2
with fsbeing the sampling frequency and t0being the
time taken by the direct sound, respectively. The window
w0, with its reciprocal being the compromise masking
limit according to [1], is defined as
w0(n)=10 3
log(N0/(N1+N2)) logn
N1+N2+0.5(4)
with N0=(0.2s + t0)fsand time index nranging from
N1+N2+1toLg−1.
Spectral Distortions
To attenuate an additional coloration of the loudspeaker
signal, we propose to weaken potential spectral distor-
tions introduced by the equalizer by applying a short
linear prediction error filter that is specifically designed
for the equalizer.
System Identification
Since the AEC is an adaptive filter which identifies the
room impulse response, its estimate can be used for the
equalizer design. Figure 1 depicts the general setup of a
combined LRC/AEC scenario.
s(n)EQ RIR +e(n)
AEC
−
Figure 1: Combined system with LRC (EQ) and acoustic
echo canceller (AEC); RIR denotes the listening room, con-
taining the speaker and the microphone.
The RIR can be split up into one part ˆ
c(n) that is cor-
rectly identified by the AEC and an estimation error
˜
c(n), due to underestimating the order of the RIR and
insufficient convergence of the AEC:
c(n)=ˆ
c(n)+˜
c(n).(5)
DAGA 2011 - Düsseldorf
611
Simulation Results
The RIR was simulated having a reverberation time of
τ60 = 200 ms and a length of Lc= 2000 taps. The filter
orders of the AEC and the equalizer h(k) were set to 1800
and 2000, respectively. As input signals Gaussian white
noise and a recorded speech signal (female speaker) were
used. The equalizer was redesigned by optimizing the p-
norm based objective function (equation (1)) every 1000
updates of the AEC with the weighting windows defined
in (2) and (3); further we set pd=10andpu= 20.
The length of the prediction error filter has been set to
Lp=50taps.
AEC Convergence
The convergence of the AEC is influenced by the addi-
tional coloration introduced by the equalizer.
0.5 1 1.5 2 2.5 3 3.5
x 1
0
5
−30
−25
−20
−15
−10
−5
0
n
DdB(n)
Noise input, EQ off
Noise input, EQ on
Speech input, EQ off
Speech input, EQ on
Figure 2: Relative System Misalignment DdB (n).
Figure 2 shows the relative system misalignment
DdB (n)=10·log10
c(n)−ˆ
c(n)2
c(n)2(6)
with the quadratic vector norm c(n)2=c(n)Tc(n)
for the two input signals s(n) (noise or speech) and for
the cases of active and inactive equalizer. In the case
of noise input, the use of the equalizer only results in a
slightly lower converge rate and slightly decreased overall
system identification. By choosing a speech signal as an
input signal the use of the equalizer results in a decrease
of both the convergence rate and the system identifica-
tion performance.
Influence of the AEC on the EQ
For evaluation of the LRC subsystem we use a revised
version of the reverberation quantization (RQ) mea-
sure [2]. The proposed measure captures the audible re-
verberation by integrating the impulse response’s energy
that exceeds the compromise temporal masking limit on
a logarithmic scale:
RQr=Lg−1
n=N0gEM (n)(7)
with
gEM (n)=20 ·log10 (|g(n)|·wu(n)) ,|g(n)|>1
wu(n)
0,otherwise
(8)
and N0being the discrete time index that is 4 ms later
than the direct impulse of g(n).
If the RIR is completely reshaped, then no time coeffi-
cient exceeds the temporal masking limit and RQr=0.
Figure3showstheRQ
rvalue in dependance of the sys-
tem misalignment of the AEC for the noise and the
speech input signal. It should be mentioned that the
x-axis is flipped so that high values for DdB,whichin-
dicate bad convergence, are left and smaller values indi-
cating good convergence are right. The dashed horizontal
line at RQr= 5218 indicates the unequalized RIR. When
the equalizer has been designed with perfect knowledge
of the RIR one reaches RQr=0.96.
0 −2 −4 −6 −8 −10 −12 −14 −16 −18 −20
2000
4000
6000
8000
Relative System Misalignment DdB
RQr(DdB)
Noise Input
Speech Input
Unreshaped RIR
Figure 3: RQrvalue of the equalized system depending on
the degree of system identification measured by DdB.
Generally, it can be seen that a high system misalignment
results in bad dereverberation performance. By choos-
ing speech as the input signal, dereverberation can be
achieved when the misalignment of the system estimate
is below −5.06 dB. With the noise input signal, the mis-
alignment must be lower than −12.1 dB to reduce the
audible reverberations.
Conclusions
In this contribution we analyzed the mutual influences of
the video-conferencing subsystems listening-room com-
pensation by utilizing the p-norm optimization and
acoustic echo cancellation. The quality of the system
identification was shown in dependance of the additional
coloration of the loudspeaker signal introduced by the
equalizer. Furthermore the performance of the audible
echo reduction was analyzed in dependance of the degree
of system identification.
References
[1] L. D. Fielder. Analysis of traditional and reverbera-
tionreducing methods for room equalization. J. Audio
Eng. Soc., 51:3–26, 2003.
[2] T. Mei and A. Mertins. On the robustness of
room impulse response reshaping. In Proc. Interna-
tional Workshop on Acoustic Echo and Noise control
(IWAENC), Tel Aviv, Israel, Aug. 2010.
[3] A. Mertins, T. Mei, and M. Kallinger. Room im-
pulse response shortening/reshaping with infinity-
and p-norm optimization. IEEE Transactions on Au-
dio, Speech, and Language Processing, 18(2):249–259,
2010.
DAGA 2011 - Düsseldorf
612