Content uploaded by Stefan Goetze

Author content

All content in this area was uploaded by Stefan Goetze on Dec 22, 2016

Content may be subject to copyright.

Room Impulse Response Reshaping by p-Norm Optimization based on Estimates of

Room Impulse Responses

Jan Ole Jungmann1, Stefan Goetze2, Alfred Mertins1

1University of L¨ubeck, Institute for Signal Processing, D-23538 L¨ubeck, Email: {jungmann, mertins}@isip.uni-luebeck.de

2Fraunhofer Institute for Digital Media Technology, D-26129 Oldenburg, Email: s.goetze@idmt.fraunhofer.de

Abstract

Hands-free telecommunication raised several real-world

problems, such as corruption of the desired signal by ad-

ditive noise, acoustic echoes, and reverberation. This pa-

per addresses the mutual impacts of the subsystems for

Acoustic Echo Cancellation (AEC) and Listening Room

Compensation (LRC) based on p-norm optimization. In

acoustic systems for LRC the equalizer is placed in front

of the loudspeaker. An estimate of the room impulse re-

sponse (RIR) is necessary for the equalizer to compensate

for the inﬂuence of the RIR at the position of the refer-

ence microphone where the human user is assumed to be

located. Since the RIR is identiﬁed by the acoustic echo

canceller anyway, its estimate can be used to design the

equalizer. The quality of dereverberation in dependance

of the degree of system identiﬁcation will be investigated

in this contribution. Furthermore, the inﬂuence of the

equalizer on the AEC is analyzed.

RIR Reshaping by p-Norm Optimization

In [3] the well-known least-squares based design rule to

build an equalizer that renders certain properties on the

global impulse response has been generalized by intro-

ducing a p-norm based objective function. Two weight-

ing windows for the desired and the unwanted part of

the global impulse response are deﬁned. The optimiza-

tion problem reads

minh:f(h)=loggupu

gdpd,(1)

with hbeing the equalizer one aims to design,

gd=diag{wd}Ch being the desired part of the global

impulse response of length Lg(guaccordingly), Cbeing

the convolution matrix made up of the RIR c(n), ·pde-

noting the p-norm of a vector, and diag {·} transforming

a vector into a diagonal matrix. The optimal solution is

approximated by applying a gradient-descent procedure.

Design of the Weighting Windows

Mertins et. al proposed to exploit psychoacoustic ﬁndings

to reduce the audible echoes and introduced weighting

functions that capture the compromise temporal masking

limit of the human auditory system [3]. The deﬁnitions

for the windows read as follows:

wd=[0,0,...,0

N1

,1,1,...,1

N2

,0,0,...,0

N3

]T(2)

and

wu=[0,0,...,0

N1+N2

,wT

0

N3

]T(3)

where N1=t0fs,N2=0.004sfs,andN3=Lg−N1−N2

with fsbeing the sampling frequency and t0being the

time taken by the direct sound, respectively. The window

w0, with its reciprocal being the compromise masking

limit according to [1], is deﬁned as

w0(n)=10 3

log(N0/(N1+N2)) logn

N1+N2+0.5(4)

with N0=(0.2s + t0)fsand time index nranging from

N1+N2+1toLg−1.

Spectral Distortions

To attenuate an additional coloration of the loudspeaker

signal, we propose to weaken potential spectral distor-

tions introduced by the equalizer by applying a short

linear prediction error ﬁlter that is speciﬁcally designed

for the equalizer.

System Identiﬁcation

Since the AEC is an adaptive ﬁlter which identiﬁes the

room impulse response, its estimate can be used for the

equalizer design. Figure 1 depicts the general setup of a

combined LRC/AEC scenario.

s(n)EQ RIR +e(n)

AEC

−

Figure 1: Combined system with LRC (EQ) and acoustic

echo canceller (AEC); RIR denotes the listening room, con-

taining the speaker and the microphone.

The RIR can be split up into one part ˆ

c(n) that is cor-

rectly identiﬁed by the AEC and an estimation error

˜

c(n), due to underestimating the order of the RIR and

insuﬃcient convergence of the AEC:

c(n)=ˆ

c(n)+˜

c(n).(5)

DAGA 2011 - Düsseldorf

611

Simulation Results

The RIR was simulated having a reverberation time of

τ60 = 200 ms and a length of Lc= 2000 taps. The ﬁlter

orders of the AEC and the equalizer h(k) were set to 1800

and 2000, respectively. As input signals Gaussian white

noise and a recorded speech signal (female speaker) were

used. The equalizer was redesigned by optimizing the p-

norm based objective function (equation (1)) every 1000

updates of the AEC with the weighting windows deﬁned

in (2) and (3); further we set pd=10andpu= 20.

The length of the prediction error ﬁlter has been set to

Lp=50taps.

AEC Convergence

The convergence of the AEC is inﬂuenced by the addi-

tional coloration introduced by the equalizer.

0.5 1 1.5 2 2.5 3 3.5

x 1

0

5

−30

−25

−20

−15

−10

−5

0

n

DdB(n)

Noise input, EQ off

Noise input, EQ on

Speech input, EQ off

Speech input, EQ on

Figure 2: Relative System Misalignment DdB (n).

Figure 2 shows the relative system misalignment

DdB (n)=10·log10

c(n)−ˆ

c(n)2

c(n)2(6)

with the quadratic vector norm c(n)2=c(n)Tc(n)

for the two input signals s(n) (noise or speech) and for

the cases of active and inactive equalizer. In the case

of noise input, the use of the equalizer only results in a

slightly lower converge rate and slightly decreased overall

system identiﬁcation. By choosing a speech signal as an

input signal the use of the equalizer results in a decrease

of both the convergence rate and the system identiﬁca-

tion performance.

Inﬂuence of the AEC on the EQ

For evaluation of the LRC subsystem we use a revised

version of the reverberation quantization (RQ) mea-

sure [2]. The proposed measure captures the audible re-

verberation by integrating the impulse response’s energy

that exceeds the compromise temporal masking limit on

a logarithmic scale:

RQr=Lg−1

n=N0gEM (n)(7)

with

gEM (n)=20 ·log10 (|g(n)|·wu(n)) ,|g(n)|>1

wu(n)

0,otherwise

(8)

and N0being the discrete time index that is 4 ms later

than the direct impulse of g(n).

If the RIR is completely reshaped, then no time coeﬃ-

cient exceeds the temporal masking limit and RQr=0.

Figure3showstheRQ

rvalue in dependance of the sys-

tem misalignment of the AEC for the noise and the

speech input signal. It should be mentioned that the

x-axis is ﬂipped so that high values for DdB,whichin-

dicate bad convergence, are left and smaller values indi-

cating good convergence are right. The dashed horizontal

line at RQr= 5218 indicates the unequalized RIR. When

the equalizer has been designed with perfect knowledge

of the RIR one reaches RQr=0.96.

0 −2 −4 −6 −8 −10 −12 −14 −16 −18 −20

2000

4000

6000

8000

Relative System Misalignment DdB

RQr(DdB)

Noise Input

Speech Input

Unreshaped RIR

Figure 3: RQrvalue of the equalized system depending on

the degree of system identiﬁcation measured by DdB.

Generally, it can be seen that a high system misalignment

results in bad dereverberation performance. By choos-

ing speech as the input signal, dereverberation can be

achieved when the misalignment of the system estimate

is below −5.06 dB. With the noise input signal, the mis-

alignment must be lower than −12.1 dB to reduce the

audible reverberations.

Conclusions

In this contribution we analyzed the mutual inﬂuences of

the video-conferencing subsystems listening-room com-

pensation by utilizing the p-norm optimization and

acoustic echo cancellation. The quality of the system

identiﬁcation was shown in dependance of the additional

coloration of the loudspeaker signal introduced by the

equalizer. Furthermore the performance of the audible

echo reduction was analyzed in dependance of the degree

of system identiﬁcation.

References

[1] L. D. Fielder. Analysis of traditional and reverbera-

tionreducing methods for room equalization. J. Audio

Eng. Soc., 51:3–26, 2003.

[2] T. Mei and A. Mertins. On the robustness of

room impulse response reshaping. In Proc. Interna-

tional Workshop on Acoustic Echo and Noise control

(IWAENC), Tel Aviv, Israel, Aug. 2010.

[3] A. Mertins, T. Mei, and M. Kallinger. Room im-

pulse response shortening/reshaping with inﬁnity-

and p-norm optimization. IEEE Transactions on Au-

dio, Speech, and Language Processing, 18(2):249–259,

2010.

DAGA 2011 - Düsseldorf

612