A ClippingBased SelectiveTap Adaptive Filtering Approach to Stereophonic Acoustic Echo Cancellation
ABSTRACT Stereophonic acoustic echo cancellation remains one of the challenging areas for tele/videoconferencing applications. However, the existence of high interchannel coherence between the two input signals for such systems leads to considerable degradation in misalignment convergence of the adaptive filters. We propose a new algorithm for improving the convergence performance and steadystate misalignment by considering robustness to the source position in the transmission room. We achieve this by exploiting the inherent decorrelating properties of selectivetap adaptive filtering as well as employing a variable clipping threshold for the unselected taps. Simulation results using colored noise and speech signals show an improvement over existing algorithms both in terms of convergence rate as well as steadystate normalized misalignment.

Conference Paper: Convergence analysis of clipped input adaptive filters applied to system identification
[Show abstract] [Hide abstract]
ABSTRACT: One of the efficient solutions for the identification of long finiteimpulse response systems is the threelevel clipped input LMS/RLS (CLMS/CRLS) adaptive filter. In this paper, we first derive the convergence behavior of the CLMS and CRLS algorithms for both timeinvariant and timevarying system identification. In addition, we employ results arising from this analysis to derive the optimal stepsize and forgetting factor for CLMS and CRLS. We show that these optimal stepsize and forgetting factor allow the algorithms to achieve a low steadystate misalignment.Signals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference on; 01/2012 
Conference Paper: Performance and convergence analysis of LMS algorithm
[Show abstract] [Hide abstract]
ABSTRACT: Rapid advances in the field of signal processing are revolutionizing algorithms. This paper describes the concept of adaptive noise cancellation, an alternative method of estimating signals corrupted by additive noise or interference. The Adaptive algorithms are used to improve the convergence rate, signal to noise ratio, stability, mean square error, steady state behavior, tracking, misadjustment has become a focus on digital signal processing. Accurate cancellation of noise in signal processing is a key step of adaptive filter algorithms. In this paper, Acoustic echo cancellation problem was discussed out of different noise cancellation techniques by concerning different parameters with their comparative results. The results shown are using some specific algorithms. The results show, improving convergence rate with less no of taps is the most difficult phase in signal processing applications for the perfect working of any system.Computational Intelligence & Computing Research (ICCIC), 2012 IEEE International Conference on; 01/2012
Page 1
1826IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
A ClippingBased SelectiveTap Adaptive
Filtering Approach to Stereophonic
Acoustic Echo Cancellation
Mehdi Bekrani, Andy W. H. Khong, Member, IEEE, and Mojtaba Lotfizad
Abstract—Stereophonic acoustic echo cancellation remains one
of the challenging areas for tele/videoconferencing applications.
However, the existence of high interchannel coherence between the
two input signals for such systems leads to considerable degrada
tion in misalignment convergence of the adaptive filters. We pro
pose a new algorithm for improving the convergence performance
and steadystate misalignment by considering robustness to the
source position in the transmission room. We achieve this by ex
ploitingtheinherentdecorrelatingpropertiesofselectivetapadap
tive filtering as well as employing a variable clipping threshold
for the unselected taps. Simulation results using colored noise and
speech signals show an improvement overexisting algorithms both
intermsofconvergencerateaswellassteadystatenormalizedmis
alignment.
Index Terms—Center clipping, convergence rate, interchannel
coherence, partial updating, selectivetap, stereophonic acoustic
echo cancellation (SAEC).
I. INTRODUCTION
T
ferencing, home entertainment, and Elearning applications
in order to achieve better perception of sound [1], [2]. Such
systems have become increasingly popular since a stereophonic
audio system provides spatial information, leading to better
perception of the transmitted speech as well as improving the
ambience of the transmission room. These systems mitigate,
to a certain extent, the cocktail party problem that exists in
a multiparty conferencing scenario [2]. One of the problems
that should be addressed in such systems is the cancellation
of stereophonic acoustic echo using a pair of adaptive filters.
Stereophonic acoustic echo cancellation (SAEC) has issues
that are considerably more challenging to overcome than the
HERE has been increasing interest in employing stereo
phonic audio communication systems for video/telecon
Manuscript received October 09, 2009; revised June 18, 2010; accepted
November 29, 2010. Date of publication February 10, 2011; date of current
version June 03, 2011. This work was supported in part by the Singapore
National Research Foundation Interactive Digital Media R&D Program under
research grant NRF2008IDMIDM004010 and in part by the Research Insti
tute for ICT. The associate editor coordinating the review of this manuscript
and approving it for publication was Prof. Sharon Gannot.
M.BekraniwaswithTarbiatModaresUniversity,Tehran,Iran.Heisnowwith
the School of Electrical and Electronic Engineering, Nanyang Technological
University, Singapore 639798 (email: mbekrani@ntu.edu.sg).
A. W. H. Khong is with Nanyang Technological University, Nanyang Tech
nological University, Singapore 639798 (email: andykhong@ntu.edu.sg).
M. Lotfizad is with the Department of Electrical and Computer Engineering,
Tarbiat Modares University, P.O. Box 14115143, Tehran, Iran (email: lot
fizad@modares.ac.ir).
Digital Object Identifier 10.1109/TASL.2010.2102752
monophonic case [3]. The fundamental problem is the poor
mismatch between adaptive filter coefficients and the receiving
room acoustic impulse responses. It has been shown [4] that, in
a practical scenario, the adaptive filter misalignment converges
poorly, leading to a performance degradation. The misalign
ment problem is caused by the high interchannel coherence that
exists between the two transmitted stereo signals [4].
A variety of methods have been proposed to address the mis
alignment problem. These methods revolve around reducing
theinterchannel coherence betweenthe two transmittedsignals.
One of the first methods involved adding controlled quantities
of independent noise to each input channel [5], or modulating
them [6]. These two approaches are however not feasible since
distortion can be heard even when the added noise level is very
low [7]. Another approach involves the use of comb filtering
[8], where frequency components of the left and right channels
are separated in order to reduce the interchannel coherence.
Although this method improves the performance of the adap
tive filters, it degrades the quality of the stereophonic sound,
especially at lower frequencies.
Toaddressthedisadvantagesofthemethodsdescribedabove,
a preprocessor has been proposed to add a nonlinear (NL) func
tion of the transmitted signal in each channel to the signal it
self [4], [9]. This method employs a halfwave rectifier and is
attractive in terms of improving the misalignment behavior of
the adaptive filters as well as its simplicity in implementation.
Although less distortion was introduced compared to the other
techniques discussed above, this distortion is still found to be
objectionable in some cases for music applications [4], [10].
An adaptive nonlinearity control was subsequently proposed to
maintain the desired level of misalignment and to minimize the
audio distortion [11].
It is apparent by now that algorithms proposed for SAEC
need to decorrelate the transmitted signals without degrading
the quality of the speech signals or destroying the stereophonic
image of the transmission room. In view of this, the use of time
varying allpass filtering of the stereophonic signals has been
proposed [12] with the aim of signal decorrelation while main
taining the stereophonic perception. The use of psychoacoustic
properties to reduce perceived distortion while achieving signal
decorrelation has also been proposed, including the use of spec
trally shaped random noise [7], [13], gain controlled phase dis
tortion [14] and the combination of comb filtering and allpass
filtering with respect to the masking effect [15]. These methods
exploit a perceptual property of the human auditory system,
called “noise masking.”
15587916/$26.00 © 2010 IEEE
Page 2
BEKRANI et al.: CLIPPINGBASED SELECTIVETAP ADAPTIVE FILTERING APPROACH TO SAEC 1827
More recent advances in SAEC research involve a decorre
lation procedure for the adaptive weight update [10], [16]–[19].
Theseapproachesdecorrelatesthetapinputvectorsoftheadap
tive filters as opposed to decorrelating the transmitted signals.
Amongthesemethods,theexclusivemaximum(XM)tapselec
tionalgorithmproposedin[17]appearedtobeanattractivesolu
tion. This algorithm achieves update signal decorrelation by en
suring that only an exclusive set of filter coefficients from each
channel is selected for adaptation. In order to reduce any degra
dation inconvergence performancedue tothis subselectionpro
cedure, the XM tapselection strategy further ensures that the
energies of these exclusive tap inputs are maximized. The XM
tap selection has been incorporated with the NL halfwave rec
tifier and the resulting XMNL normalized leastmeansquare
(XMNLNLMS) algorithm has been shown to achieve a higher
rate ofmisalignment convergencecompared tothat ofnonlinear
NLMS (NLNLMS) [17].
We propose to further improve the convergence performance
of the XM tapselection algorithm. This motivation is derived
from the degradation in convergence performance of the XM
tapselection algorithm when the interchannel coherence be
tweenthetwotapinputvectorsisrelativelylow.Thiscanoccur,
for example, when the source in the transmission room is lo
cated away from the centroid of the stereophonic microphone
pair. We note that the robustness issue of XMNLNLMS to
the source position has not been investigated and that, in this
work, we present insight into this problem. Utilizing this new
knowledge, we propose to improve the misalignment conver
gence of XMNLNLMS by employing a centerclipping algo
rithm so that the low interchannel coherence and maximization
of tapinput energy criteria can be jointly optimized. The pro
posed algorithmensuresthatthemisalignmentconvergenceand
steadystate misalignment of the adaptive filters will be robust
to the source position in the transmission room.
In [7] and references therein, the authors evaluated the adap
tive signal decorrelation filter as a preprocessor and reported
that complete decorrelation in the frequency domain cannot be
achieved unless one or both of the stereophonic signals are zero
at every frequency. This process is undesirable since it destroys
the stereophonic image of the transmitted signals, which is im
portant to the listeners in the receiving room. Therefore, they
concluded that complete decorrelation is not applicable in prac
tice. Our proposed method, as opposed to the technique dis
cussed in [7], does not apply decorrelation filtering to the trans
mitted signals. We instead operate on the tapinput vector of
the adaptive filters. Therefore, similar to XMNLNLMS, the
stereophonic image is preserved. We also note that the decor
related tapinput vectors of both algorithms may prevent some
of the adaptive filter coefficients from adaptation in some iter
ations. However, this effect will not significantly degrade the
convergenceofweightssinceanysignificantreductioninthein
terchannel coherence due to our centerclipping approach will
bring about an improvement in convergence rate. It is also im
portanttonotethat,similartotheapproachin[17],theuseofthe
NLpreprocessorisrequiredtoprovideasolutiontotheillposed
SAEC problem, while our proposed centerclipping approach
improvestheconvergencerateandrobustnessofXMNLNLMS
to the source position.
Fig. 1. Stereophonic acoustic echo cancellation for teleconferencing applica
tion.
II. REVIEW OF STEREOPHONIC ACOUSTIC ECHO
CANCELLATION
Fig. 1 shows the stereophonic acoustic echo canceller. For
simplicity, we consider only one microphone in the receiving
room, since similar analysis can be applied to the other channel
[4]. Microphones in the transmission room receive signals pro
duced by the sound source
andgivingtransmittedsignals
spectively, where
for Channel andis the length of the transmission room im
pulse responses, while
is defined as the transpose operator.
The transmitted signal to the receiving room for Channel
then be expressed as
via acoustic impulse responses
and,re
can
(1)
where
signals produce an echo
. These
in the receiving room given by
(2)
where
the
is
th channel receiving room impulse response and
is the
.th channel tapinput vector while
Similar to singlechannel AEC, adaptive filters are employed
to estimate
and . In this paper, we assume, similar
to that of [4], [17], that the adaptive filters are each of length
whichisofthesamelengthasthatof
istic applications where the adaptive filters are shorter than that
of
, residual echo will be transmitted back to the transmis
sion room due to the unmodeled “tails” of
The error between the echo and its estimate can then be ex
pressed as
is the length of
,
and.Forreal
.
(3)
where
the vector of adaptive filter coefficients for the th channel.
,1, 2, is
A. Nonlinear Normalized LeastMeanSquare Algorithm
In order to efficiently reduce
employedforSAEC.Defining
theNLMSalgorithm[20]istheresultofminimizing
, adaptive algorithms are
astheexpectationoperator,
,
Page 3
1828IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
and is popular because of its simplicity in computational effi
ciency and ease of implementation. The twochannel NLMS al
gorithm is expressed as
(4)
where
and
aretheconcatenated twochannel
tapinput vector and filter coefficient vector, respectively, while
is the stepsize, which controls the rate of convergence, and
is a regularization parameter to prevent division by zero.
Unlike singlechannel AEC, estimation of
for the stereophonic case is challenging. As shown in [4], when
, the adaptive filters coefficients is of the form
and
(5)
where
that
quantity.Equation(5)indicatesthatmultiplesolutions existand
that
andare linearly related to
The dependency of
and
tions due to
andis known as the nonuniqueness
problem. In practical cases where
problemismitigated[4].However,eveninsuchcases,directap
plication of the standard adaptive filtering is normally not suc
cessful because
and
rise to an illconditioned system identification problem. As a
result, the misalignment convergence of the adaptive filters is
impaired significantly. This degradation is known as the mis
alignment problem [4].
In order to address the misalignment problem, a nonlinear
(NL) preprocessor is proposed [4], [21]. This preprocessor op
erates on
andsuch that the modified transmitted
signals
andare given by
,
zeros and
1, 2 are vectors given
is appended with is a scalar
and.
on these multiple solu
, the nonuniqueness
are highly correlated, giving
(6)
(7)
where
been shown in [4] that a value of
between speech quality and misalignment convergence of the
NLMS algorithm.
Due toitssimplicity inimplementationandthelowdistortion
introduced, the NL preprocessor has become an intrinsic part of
SAEC and has been incorporated into several recently proposed
algorithms for SAEC [10],[17]. Fortheremainder of thispaper,
the NLMS algorithm employing this NL preprocessing will be
referred to as NLNLMS.
controls the amount of nonlinearity to be added. It has
is a good compromise
B. ExclusiveMaximum (XM) TapSelection Algorithm
The XMNLNLMS update [17] can be expressed as
(8)
where
1,2isthechannelindex,
istheNLpre
Fig. 2. Locations of the source and the microphone pair in the transmission
room.
processed tapinput vector of the
by (6) and (7),
th channel defined
, while
is atapselection matrix and
with elements
given by
otherwise
(9)
otherwise
(10)
where ,
denote the elemental indices of
, respectively, andand,
.
As can be seen, the XM algorithm [17] incorporates a tapse
lection scheme that reduces the interchannel coherence by se
lecting exclusivefilter coefficients for updating in each channel.
It is important to note that the selected tap inputs are only used
for updating the coefficients and hence no distortion is intro
duced. However, with any tapselection updating strategy, con
vergence performance of the adaptive filters will be reduced.
This degradation is then minimized by jointly maximizing the
normoftheselectedtapinputsacrossbothchannels.Theuse
of the NL preprocessor is required to provide a solution to the
illposed SAEC problem, while the XM approach improves the
convergence rate of NLNLMS. As a result of this combination,
which we refer to as XMNL, better misalignment convergence
of the adaptive filter can be achieved. Alternatively, the XM tap
selection can be seen as an effective approach to achieve good
misalignment convergence with lower distortion brought about
by a smaller nonlinearity factor
.
III. EFFECT OF SOURCE POSITION ON MISALIGNMENT
CONVERGENCE
One of the problems that has yet been considered for
XMNLNLMS is its robustness to the position of the source
in the transmission room. We now illustrate the misalignment
convergence of XMNLNLMS by considering different source
positions. Fig. 2 shows an experimental setup where two mi
crophones are placed at
(3,2,1.5) m in a room with dimensions 7 m
vary the
position of the source starting from the front of the
positions (2.7,2,1.5) m and
7 m 4 m. We
Page 4
BEKRANI et al.: CLIPPINGBASED SELECTIVETAP ADAPTIVE FILTERING APPROACH TO SAEC1829
Fig. 3. Misalignment convergence of the XMNLNLMS algorithm (solid) for
different positions of the source, as compared to NLNLMS (dashed).
array centroid at
phone at
1.5) m for the microphone and (2.85, 1.8, 1.6) m and (2.4, 1.1,
1.7) m for the two loudspeakers. For the purpose of illustration,
all room impulse responses are generated synthetically using
the method of images [22] such that
samples and these synthetic impulse responses have lengths
that correspond to their reverberation times. At a sampling
rate of
Hz, this corresponds to 23 ms and 36 ms,
respectively. A stationary colored noise source signal
obtained by filtering white Gaussian noise through a lowpass
finite impulse response (FIR) filter with coefficients given
by
[23] which was chosen to generate a
speechlike spectrum. The convergence of the algorithms is
quantified by the normalized misalignment
m to the front of the right micro
m. The positions in the receiving room are (3, 2,
and
is
(11)
where
mentconvergenceofXMNLNLMSaveragedovertenindepen
dent trials, for the different source positions. The convergence
performance of NLNLMS for
in front of the microphone pair centroid, has been included for
comparison. Additional tests conducted have shown that the
misalignment convergence of NLNLMS for various source po
sitions is comparable to that shown in Fig. 3. A stepsize of
was used for XMNLNLMS, while the stepsize of
NLNLMS was adjusted to
alignment reaches that of XMNLNLMS. As shown in Fig. 3,
XMNLNLMS outperforms the fullupdate NLNLMS when
the source is directly in front of the microphone pair centroid at
m. On the contrary, the convergence rate of XMNL
NLMS is reduced significantly when the source is located away
from the microphone pair centroid such as when
To gain further insight into the degradation in convergence
rate of XMNLNLMS with respect to the source position, we
consider both the interchannel coherence as well as the ratio of
selected tapinput energy to the total tapinput energy.
. Fig. 3 shows the misalign
m, when it is directly
so that its steadystate mis
m.
Fig. 4. Average interchannel coherence of the NLNLMS (dashed) and
XMNLNLMS (solid) algorithms for various positions of the source.
A. Effect of XM Tap Selection on Interchannel Coherence
We first investigate the effect of XM tapselection on the in
terchannel coherence for various source positions. We denote
the XM subselected tapinput vector in (8) as
(12)
The interchannel coherence between
defined by
andis then
(13)
where
is the cross power spectrum between
is the normalized frequency. For the same condi
tion as in Figs. 3 and 4 shows the mean interchannel coherence
between
and, across different frequencies for var
ious source positions, obtained by averaging over ten indepen
dent trials.
As can be seen, when the source is in front of the microphone
pair centroid
m , the XM tapselection criterion is
utilizedefficientlytodecorrelateinputvectors
giving a low interchannel coherence of 0.43. Due to this effi
cient decorrelation, a good misalignment convergence shown in
Fig.3isachieved.Forthissourcelocation,themodestamountof
degradation due to tap selection does not significantly outweigh
the benefits brought about by the reduction in interchannel co
herence due to the exclusivity criterion. Fig. 5(a) shows an ex
ample of the XM selected taps
clarity, we show only the first 80 samples of
eachof length
samples.As can be seen, mostofthe se
lected taps in the first channel correspond to elements in
being greater than zero, whereas for the second channel, most
of the active taps correspond to elements in
thanzero.ThisisduetotheintrinsiceffectofNLpreprocessing,
which increases the magnitude of the positive elements in the
first channel and the negative elements in the second channel.
As a result of XM tap selection on
channel coherence of 0.43 is achieved as shown in Fig. 4.
AsshowninFig.4,theinterchannelcoherencebetween
and employing XM tap selection increases from
mtoapproximately
in interchannel coherence even after XM tap selection is ap
plied,Fig.5(b)showsaplot of
and
while
and,
andin this case. For
and,
being less
and, low inter
m.Tounderstandthisincrease
andform.
Page 5
1830IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 5. Selected tap inputs ? ??? and ? ??? for the XMNLNLMS algorithm
forvarioussourceposition?. (a)? ? ????m.(b)? ? ????m.(c)? ? ????m.
As can be seen, the effect of NL preprocessing on XM tap se
lection reduces and the similarity between
creases. The increase in interchannel coherence between
andin turn reduces the convergence rate for XMNL
NLMS, as can be seen from Fig. 3.
Fig. 5(c) illustrates
and
front of the right microphone at
being further away from the left microphone, elements in
have magnitudes much lower than those of
it is expected that
differs from
difference, the interchannel coherence is lower for
comparedto
m,asshowninFig.4.Itisthereforeexpected
thatthemisalignmentconvergenceofXMNLNLMSshouldin
crease when the source moves from
On the contrary, however, the XMNLNLMS convergence rate
continues to reduce with increasing
from Fig. 3. In Section IV, we gain better insight into this con
tradictory behavior by studying the effect of XM tap selection
on the energies of the active tap inputs for different source po
sitions.
andin
when the source is in
m. Due to the source
. In addition,
. As a result of this
m
m tom.
position, as can be seen
B. Effect of XM Tap Selection on TapInput Energies
In order to further illustrate how XM tap selection affects the
energies of the tap inputs, we employ the
defined as
ratio criterion [17]
(14)
where
lection, with elements defined by (9) and (10). Our intention is
not to show the exactrelationship between
ment convergence rate, but to illustrate that the loss of tapinput
energy has an undesirable effect on the convergence rate of the
adaptive filters. Fig. 6 illustrates how
sitionforXMNLNLMSandNLNLMS,obtainedbyaveraging
overallframesinthesignalwhereeachframeiscalculatedusing
for the XM tap se
and themisalign
varies with source po
Fig. 6. Variation of ? against source position for NLNLMS (dashed) and
XMNLNLMS (solid).
(14). As can be seen, NLNLMS has
sitions since all tapinputs are used for weight update. On the
other hand, for XMNLNLMS,
position of the source.
We can now see from Figs. 4 and 6 why XMNLNLMS
achieves poorer convergence performance when the source
is far from the centroid of the microphone pair: although the
interchannel coherence is relatively low, the
sufficiently high to reduce the degradation in convergence rate
due to tap selection. As a consequence of this conflict between
the need to reduce interchannel coherence and maximization
of tapinput energies, the overall result is a reduction in con
vergence rate of XMNLNLMS, as can be seen from Fig. 3.
On the other hand, when
vergence rate due to a reduction in
reduction in interchannel coherence, as shown in Figs. 6 and
4, respectively. As a result of this joint effect, good overall
convergence performance can be obtained for XMNLNLMS,
as can be seen in Fig. 3.
As an additional note, the position of the source affects not
only the misalignment convergence of XMNLNLMS, but also
its steadystate value. As can be seen from Fig. 3, the steady
state normalized misalignment is higher than that of NLNLMS
for increasing
position since the weight update is performed
using only a fraction of the tap inputs. This causes an additional
error in the weightupdate, resulting in an increase in the steady
state normalized misalignment.
for all source po
and increases with
ratio is not
m, the degradation in con
is offset by a significant
IV. CENTERCLIPPING APPROACH TO SAEC
We now propose a centerclipping algorithm that has the
ability to reduce the interchannel coherence according to
the source location in the transmission room. We exploit the
decorrelation properties of XM tap selection, similar to that of
XMNLNLMS. In addition we propose an errorbased com
pensation technique that addresses the additional steadystate
normalized misalignment resulting from XM tap selection.
A. CenterClipping Exclusive Maximum Tap Selection
We propose to apply centerclipping to the tapinput vectors
in order to increase the energies of the “inactive” tap inputs
when the interchannel coherence between
relatively low, such as when the source is in front of one of the
microphones. This is to reduce the degradation in misalignment
convergencebroughtaboutbytheXMtapselectionprocess. As
and is
Page 6
BEKRANI et al.: CLIPPINGBASED SELECTIVETAP ADAPTIVE FILTERING APPROACH TO SAEC1831
Fig. 7. Schematic diagram of the proposed structure.
will be shown in Section IVD, the clipping threshold is based
on indirect estimation of the similarity between the energies of
and. This similarity reflects how close the source
is to the microphone centroid, which in turn affects the con
vergence behavior. The proposed approach ensures a softop
timization constraint, which makes it robust to source position.
A schematic diagram of the proposed method is as shown in
Fig. 7.
The proposed clippingbased XMNLNLMS algorithm
(cXMNLNLMS) updates the filter coefficients using
(15)
(16)
where
selection matrix with diagonal elements defined in (9) and (10),
and the
matrix is defined by
is defined in (6) and (7),is the XM tap
(17)
The matrix
not selected by XM. The purpose of the proposed centerclip
ping strategy is to increase the energies of tap inputs corre
sponding to the “inactive” (unselected) taps. We achieve this by
first defining
as the clipped vector whose elements are computed by (18),
as shown at the bottom of the page, where
amount of clipping for
. In Section V, we discuss how this
clipping threshold can be determined for our SAEC application.
is used to identify the tapinput elements
controls the
B. Effect of
on TapInput Energy
The range of
bounded between zero and the maximum magnitude of any el
ement within that vector, i.e.,
for each tapinput vectorcan be
, where
(19)
Fig. 8. Variation of ? and ? against ? ?????
selection algorithm and centerclipping algorithm, respectively.
??? for the XM tap
It can be seen from (18) that when, we have
, which results in the second term of (16)
having values equivalent to the unselected tap inputs, so that
and the proposed cXMNLNLMS algorithm be
comes the fullupdate NLNLMS algorithm. On the other hand,
when
, we have
second term of (16) to vanish, hence reducing cXMNLNLMS
to XMNLNLMS.
In order to illustrate how the clipping threshold
the energies of the tapinput vectors
ploy a
ratio criterion similar to the one defined by (14)
, causing the
affects
, we em and
(20)
where the subscript
Fig. 8 illustrates how
using signals generated by convolving the previously defined
colored speechlike noise sequence with
the source position is at (2.85,1.85,1.6) m. If
channels,
and therefore the convergence behavior of
the centerclipped cXMNLNLMS algorithm will be equivalent
to NLNLMS. On the other hand, when
we have
and hence the performance of the proposed
cXMNLNLMSalgorithmwillbeequivalenttoXMNLNLMS.
indenotes centerclipped signals.
vary with and,
andwhen
for both
,
C. Effect of Clipping Threshold on Interchannel Coherence
As illustrated in Section III, the misalignment conver
gence of XMNLNLMS depends on both the tapinput
energy as well as the interchannel coherence between
and. As such, we investigate the effect of
the interchannel coherence between
convolve the same colored Gaussian noise sequence with
and, where
sitioned at coordinates (2.85,1.85,1.6) m. A total of ten
on
and. We
, with the source po
(18)
Page 7
1832IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 9. Interchannel coherence versus clipping threshold for colored noise
signal.
Fig. 10. Variation of ???? against horizontal position ? of the source in the
transmission room for four cases of vertical positions ?.
independent trials are averaged and Fig. 9 illustrates how the
mean interchannel coherence across frequency varies with
, where we constrain
that
the interchannel coherence reduces with increasing values of
clipping threshold. This is as expected since with increasing
, less energy will be allocated to the unselected taps
brought about by the XM tapselection criterion, thus reducing
the similarity between
and
quence causes a reduction in interchannel coherence.
andsuch
. As can be observed,
, which as a conse
D. SoftDecision Rule for Clipping Threshold
As shown in Figs. 8 and 9, a high value of
duce both
and interchannel coherence. As described in
Section III, a reduction in interchannel coherence is crucial
when the source position is near the centroid of the microphone
array pair, while the need to increase
when the source is nearer to one microphone. Hence,
enables a tradeoff between interchannel decorrelation and
degradation of misalignment convergence due to tap selection.
Therefore, high
is desirable when the source is near the
centroid of the microphone pair, while low
when thesource is near one of themicrophones. As was pointed
out, the similarity between the energies of
contributes to the high convergence rate of XMNLNLMS
when the source is near the microphone pair centroid. On the
other hand, considerable difference in the energies as well as
low interchannel coherence between
convergence rate of the XMNLNLMS to reduce significantly.
will re
becomes important
is desirable
and
andcause the
Fig. 11. Clipping threshold ? ??? versus ????.
We therefore propose to use the difference between absolute
values of the two channels as a measure of energy dissimilarity.
It is foreseeable that when the source is near the microphone
array centroid, the relative absolute values of
are approximately equal. On the contrary, when the source is
nearer to one microphone, the relative absolute values of these
tapinput vectors differ from each other. Thus, the dissimilarity
measure is defined as
and
(21)
where
solute values of the input elements given by
, 1, 2, is a moving average of the ab
(22)
Here, we use
of instantaneous changes of
threshold. We see that
to 0 when the received energy of the two microphones are ap
proximately equal and approaches 1 when the received energy
from one microphone is much larger than that of the other mi
crophone.
Fig. 10 shows four illustrative examples of how
withdifferent
andpositionsofthesourceinthetransmission
room. In these examples, the room dimensions and the coordi
nates of microphones are given as shown in Fig. 2. As expected,
the value of
is small when the source is close to the cen
troid at
m. On the contrary,
source is near Microphone 2 at
Wenowincorporate
intocXMNLNLMSbyvaryingthe
valueof
asafunctionof
when the source is nearer to one microphone and vice versa,
shouldreducewithincreasing
a piecewise linear mapping
to smooth so as to avoid the effects
on the value of the clipping
and thatis close
varies
is large when the
m.
.Sincewedesirealow
.Wethereforepropose
(23)
The relation between
speech signals, we use values
were determined empirically.
We notefrom (23) that
each channel, and indirectly depends on the relative position of
the source and the microphones. In addition, when
such as when the source is near the microphone pair centroid,
increases with reducing
the second term in (16) and as a result, the proposed algorithm
converges in the same manner as XMNLNLMS. On the other
andis plotted in Fig. 11. For
andwhich
is independently determinedfor
,
. This reduces the effect of
Page 8
BEKRANI et al.: CLIPPINGBASED SELECTIVETAP ADAPTIVE FILTERING APPROACH TO SAEC1833
Fig. 12. Comparison of ? and ?
cXMNLNLMS.
for NLNLMS, XMNLNLMS, and
hand, when
of the microphones,
thishastheeffectofincreasingtheenergyoftheunselectedtaps,
which in turn reduces the degradation in misalignment conver
gence due to XM tap selection in situations where reducing the
interchannel coherence cannot further improve the convergence
rate of the adaptive algorithm.
Fig. 12 further illustrates how degradation in
consequently the convergence performance due to tap se
lection, can be reduced by incorporating
proposed cXMNLNLMS algorithm. As described earlier in
Section IIIB, the degradation of convergence performance
for XMNLNLMS when the source is far away from the mi
crophone pair centroid
compared to NLNLMS. In this scenario, the proposed
cXMNLNLMS algorithm ensures that
value 1 achieved by NLNLMS. On the other hand, when the
source is near the centroid
tains the beneficial properties of the XM tapselection strategy
to maximally decorrelate the tapinput vectors. The overall
joint result is a fast converging cXMNLNLMS that is robust
to the source position in the transmission room.
increases, such as when the source is near one
reduces to zero. As shown in Fig. 8,
, and
into the
is due to a reduction of
is closer to the
m , cXMNLNLMS at
V. ENHANCEMENT OF THE STEADYSTATE PERFORMANCE
As noted from Fig. 3 and Section III, the steadystate nor
malized misalignment of XMNLNLMS is higher than that of
NLNLMS. This is due to the unselected filter coefficients in
troducing an error during adaptation since there is now a mis
match between
, which drives the unknown system, and
the selective tapinput vector
regardless of the source position. It is therefore expected that
cXMNLNLMS also suffers from increased steadystate mis
alignmentsincetheproposedclippingmethodgeneratesasignal
that is different from
To illustrate this, we consider input vectors
(17) each of length
. A colored noise source signal
generated as described in Section III, is positioned at coordi
nates (2.89,1.85,1.6) m. As before, the room is of dimension
as shown in Fig. 2. This steadystate normalized misalignment
is achieved by allowing the algorithm to reach its steadystate
and averaging over the last 5000 samples. Fig. 13 shows how
the steadystate normalized misalignment varies with normal
ized clipping thresholds
. This error occurs
.
defined by
.
Fig. 13. Relation between steadystate misalignment and normalized clipping
threshold in NLNLMS, XMNLNLMS, and cXMNLNLMS, when the source
is at (2.89,1.85,1.6) m.
As can be seen, the steadystate normalized misalignment for
NLNLMS is
29 dB, and24 dB for XMNLNLMS. If we
employ
defined by (23) for the above source position,
we obtain
NLMS algorithm gives an additional 1 dB of steadystate nor
malized misalignment improvement over XMNLNLMS.
As a final improvement, we propose to enhance the
steadystatenormalized misalignment
cXMNLNLMS. We note that the steadystate performance of
cXMNLNLMS depends on the source position and therefore
when
, we need to reduce the additional steadystate
normalized misalignment. Hence, we propose to reduce
to zero after convergence of the meansquare error (MSE).
To estimate the convergence of the algorithm, we employ the
following recursive relation for estimating the MSE [24]
, so the proposed cXMNL
performanceof
(24)
where
process. We consider
when
In this case, the normalized misalignment reduces towards the
steadystate misalignment of NLNLMS. We therefore propose
to incorporate
into (23) giving (25),
isrelatedtothetimeconstantoftheaveraging
for our experiments. Hence,
reachesbelowalowerlimit,willbesettozero.
otherwise
(25)
where
achieve a low level of MSE after convergence. Fig. 14 shows
an example of MSE and misalignment convergence for a single
trial when the source is at coordinates (2.87,1.85,1.6) m. As can
be seen, setting
based on (23) brings about a higher initial
convergence rate than NLNLMS, while reducing
using (25) after MSE convergence will bring about additional
reduction in steadystate normalized misalignment. Additional
tests revealed that although the convergence of MSE occurs
before convergence of misalignment, exact knowledge of MSE
is not required. The proposed cXMNLNLMS algorithm is
summarized in Table I.
is an empiricallyderived lower limit that aims to
to zero
Page 9
1834IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 14. Estimated MSE and misalignment for the cXMNLNLMS algorithm.
TABLE I
THE cXMNLNLMS ALGORITHM
VI. FURTHER SIMULATION AND EXPERIMENTAL RESULTS
We evaluate, by way of further simulation, the performance
of cXMNLNLMS under different source positions. In order
to simulate the SAEC system, impulse responses
,, and were generated using the method
of images [22]. To evaluate the robustness of the algorithms,
,
TABLE II
SPECIFICATIONS OF THE SIMULATED ENVIRONMENT IN SAEC
we fixed the location of the microphones while the source
position in the transmission room was varied across three
cases shown in Table II. A sampling rate of
was used throughout the experiment. The source signal was
generated by filtering a white Gaussian noise signal through a
lowpass finite impulse response (FIR) filter with coefficients
, as was used in Section III.
We compare the convergence performance of the pro
posedcXMNLNLMSalgorithm
XMNLNLMS. Since the steadystate normalized misalign
ment for XMNLNLMS varies with the source position, we
chose its stepsize so that its steadystate normalized mis
alignment reaches that of NLNLMS and cXMNLNLMS
when the source position is in front of the microphone array
centroid at (2.85,1.85,1.6) m. This corresponds to
for both NLNLMS and cXMNLNLMS and
XMNLNLMS. White Gaussian noise (WGN) is added to
to achieve andB. For all simulations, we have used
dB for cXMNLNLMS. The normalized misalign
ment curves, obtained by averaging over ten independent trials,
are plotted for Cases 1, 2, and 3 (Table II) in Fig. 15(a)–(c)
respectively.
Fig. 15(a) shows the convergence performance of the algo
rithms where the source is directly in front of the right micro
phone. In this case,
and
and hence a high value of
As shown in Fig. 11, this translates to a low
sequently, as shown in (18),
the convergence performance of cXMNLNLMS is equivalent
to that of NLNLMS. The proposed cXMNLNLMS algorithm
thus achieves an initial convergence of nearly 8 dB better than
XMNLNLMS and reaches a steadystate normalized misalign
ment of 4 dB lower as expected.
Fig. 15(b) shows convergence results when the source posi
tion is midway between the microphone pair centroid and the
right microphone at (2.88,1.85,1.6) m. Now, the interchannel
coherence increases relative to the previous case and as can be
seen from this result, cXMNLNLMS achieves the highest rate
Hz
with NLNLMSand
for
are significantly different
defined in (21) is expected.
, and con
. As a result,
Page 10
BEKRANI et al.: CLIPPINGBASED SELECTIVETAP ADAPTIVE FILTERING APPROACH TO SAEC1835
Fig. 15. Normalized misalignment of the NLNLMS, XMNLNLMS, and
cXMNLNLMSalgorithmsforacolorednoisesourcesignal.(a)Sourcedirectly
in front of right microphone at (3,1.9,1.55) m. (b) Source at (2.88,1.85,1.6) m.
(c) Source in the center of microphone pair at (2.85,1.85,1.6) m.
of initial convergence, improving that of NLNLMS by nearly 4
dB during initial convergence. We note that when compared to
XMNLNLMS, cXMNLNLMS achieves approximately 3 dB
improvement during initial convergence and about 2 dB lower
steadystate normalized misalignment.
Finally, when the source position is in front of the micro
phone pair centroid at coordinates (2.85,1.85,1.6) m,
aresimilarandtheinterchannelcoherencebetween
andis high. As can be seen from Fig. 15(c), the conver
gence of cXMNLNLMS achieves the highest rate of conver
gence with an improvement of approximately 4 dB over that
of XMNLNLMS and nearly 10 dB over that of NLNLMS. In
terms of steadystate normalized misalignment, the NLNLMS
algorithmrequiresnearly10smorethanthatofcXMNLNLMS
to reach
30 dB.
To further illustrate the convergence performance of the
proposed cXMNLNLMS algorithm, we simulated the SAEC
system using a speech signal as shown in Fig. 16. In this ex
ample, the speech signal is sampled at 11025 Hz and a WGN is
added to
to achieve an SNR
source in the transmission room is (2.88,1.85,1.6) m. As can be
seen from this result, cXMNLNLMS achieves approximately
6 dB lower misalignment than NLNLMS and 4 dB lower than
XMNLNLMS during initial convergence.
We consider using recorded impulse responses, where the di
mensions of the transmission room is 6.5 m
the source was positioned at (3.25,4.37,1.15) m while the
two microphones were placed at (3.11,2.37,1.2) m and
(3.39,2.37,1.2) m for Case 1 and at (3.25,2.37,1.2) m and
(2.83,2.37,1.2) m for Case 2. The estimated reverberation time
was 280 ms. These impulse responses were of length 3087
samples and subsequently truncated to 512 samples. Fig. 17
and
dB. The position of the
8.75 m 2.65 m,
Fig. 16. Normalized misalignment for the NLNLMS, XMNLNLMS, and
cXMNLNLMS algorithms when the source is at (2.88,1.85,1.6) m for a speech
signal.
Fig. 17. Illustration of measured transmission room impulse response ? .
Fig. 18. Normalized misalignment of the NLNLMS, XMNLNLMS, and
cXMNLNLMS algorithms for real room impulse responses. (a) Case 1: source
in the center of microphone pair. (b) Case 2: source approximately in front of
the right microphone.
shows one of the measured impulse responses in the trans
mission room. For this experiment, the sampling frequency,
stepsizes as well as the SNRs were the same as that of the
previous simulations. The results are shown in Fig. 18. As
can be seen from Fig. 18(a), cXMNLNLMS achieves nearly
3 dB improvement in convergence performance compared to
XMNLNLMS when the source is in front of the microphone
centroid. In Fig. 18(b), when the source is in front of the right
microphone, the proposed algorithm achieves nearly 6 dB
improvement in convergence compared to XMNLNLMS.
VII. CONCLUSION
We presented a new approach to improve the misalignment
convergence as well as the steadystate performance and ro
bustness of adaptive filters for SAEC. This approach retains
the decorrelation properties of the XM selectivetap algorithm
Page 11
1836IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
when the source is located near the microphone centroid, but
employs a variable centerclipping threshold whose value is de
rived based on the absolute values of the received microphone
signals in order to work better, when the source is located closer
to one of the microphones. The proposed approach achieves
better convergence performance for different source positions
in comparison to both NLNLMS and XMNLNLMS.
REFERENCES
[1] AudioSignalProcessingforNextGenerationMultimediaCommunica
tion Systems, Y. Huang and J. Benesty, Eds..
2004.
[2] J. Benesty, M. M. Sondhi, and Y. Huang, Handbook of Speech Pro
cessing. Secaucus, NJ: SpringerVerlag, 2008.
[3] J. Benesty, T. Gänsler, D. R. Morgan, M. M. Sondhi, and S. L. Gay,
Advances in Network and Acoustic Echo Cancellation.
SpringerVerlag, 2001.
[4] J. Benesty, D. R. Morgan, and M. M. Sondhi, “A better understanding
and an improved solution to the specific problems of stereophonic
acoustic echo cancellation,” IEEE Trans. Speech Audio Process., vol.
6, no. 2, pp. 156–165, Mar. 1998.
[5] M. M. Sondhi and D. R. Morgan, “Acoustic echo cancellation for
stereophonic teleconferencing,” in Proc. IEEE Workshop Applicat.
Signal Process. Audio Acoust., 1991, pp. 141–142.
[6] S. Shimauchi and S. Makino, “Stereo projection echo canceller with
true echo path estimation,” in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process., 1995, pp. 3059–3062.
[7] M. M. Sondhi, D. R. Morgan, and J. L. Hall, “Stereophonic acoustic
echo cancellationAn overview of the fundamental problem,” IEEE
Signal Process. Lett., vol. 2, no. 8, pp. 148–151, Aug. 1995.
[8] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, “Stereophonic
acoustic echo cancellation using nonlinear transformations and comb
filtering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,
1998, pp. 3673–3676.
[9] J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, “Synthesized
stereo combined with acoustic echo cancellation for desktop confer
encing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,
1999, pp. 148–158.
[10] K. Mayyas, “Stereophonic acoustic echo cancellation using lattice or
thogonalization,” IEEE Trans. Speech Audio Process., vol. 10, no. 7,
pp. 517–525, Oct. 2002.
[11] T.GänslerandJ.Benesty,“Newinsightsintothestereophonicacoustic
echocancellationproblemandanadaptivenonlinearitysolution,”IEEE
Trans. Speech Audio Process., vol. 10, no. 5, pp. 257–267, Jul. 2002.
[12] M. Ali, “Stereophonic acoustic echo cancellation system using time
varying allpass filtering for signal decorrelation,” in Proc. IEEE Int.
Conf. Acoust., Speech, Signal Process., 1998, pp. 3689–3692.
[13] T.Tangsangiumvisai,J.A.Chambers,andA.G.Constantinides,“Time
varying allpass filters using spectralshaped noise for signal decorre
lation in stereophonic acoustic echo cancellation,” in Proc. Int. Conf.
Digital Signal Process., 2002, pp. 87–92.
[14] J. Herre, H. Buchner, and W. Kellermann, “Acoustic echo cancellation
forsurroundsoundusingperceptuallymotivatedconvergenceenhance
ment,”in Proc. IEEEInt. Conf.Acoust., Speech, SignalProcess., 2007,
pp. I17–I20.
[15] J. M. Valin, “Perceptuallymotivated nonlinear channel decorrelation
for stereo acoustic echo cancellation,” in Proc. HandsFree Speech
Commun. Microphone Arrays (HSCMA), 2008, pp. 188–191.
[16] S.Emura,Y.Haneda,A.Kataoka,andS.Makino,“Stereoechocancel
lation algorithm using adaptive update on the basis of enhanced input
signal vector,” Signal Process., vol. 86, pp. 1157–1167, Jun. 2006.
[17] A. W. H. Khong and P. A. Naylor, “Stereophonic acoustic echo can
cellation employing selectivetap adaptive algorithms,” IEEE Trans.
Speech Audio Process, vol. 14, no. 3, pp. 785–796, May 2006.
Norwell, MA: Kluwer,
New York:
[18] M. Bekrani, A. W. H. Khong, and M. Lotfizad, “Neural network based
adaptive echo cancellation for stereophonic teleconferencing applica
tion,” in Proc. Int. Conf. Multimedia Expo, 2010, pp. 1172–1177.
[19] M. Bekrani, M. Lotfizad, and A. W. H. Khong, “An efficient quasi
LMS/newton adaptive algorithm for stereophonic acoustic echo can
cellation,” in Proc. IEEE Asia Pacific Conf. Circuits Syst., 2010.
[20] S. Haykin, Adaptive Filter Theory.
Hall, 2001.
[21] D.R.Morgan,J.L. Hall,andJ.Benesty,“Investigationofseveraltypes
of nonlinearities for use in stereo acoustic echo cancellation,” IEEE
Trans. Speech Audio Process., vol. 9, no. 6, pp. 686–696, Sep. 2001.
[22] J. B. Allen and D. A. Berkley, “Image method for efficiently simu
lating smallroom acoustics,” J. Acoust. Soc. Amer., vol. 65, no. 4, pp.
943–950, Apr. 1979.
[23] S. Attallah, “The wavelet transformdomain LMS adaptive filter with
partialsubbandcoefficientupdating,”IEEETrans.CircuitsSyst.II:Ex
press Briefs, vol. 53, no. 1, pp. 8–12, Jan. 2006.
[24] K. Mayyas, “New transformdomain adaptive algorithms for acoustic
echocancellation,”DigitalSignalProcess.,vol.13,no.3,pp.415–432,
Jul. 2003.
Englewood Cliffs, NJ: Prentice
Mehdi Bekrani was born in Gorgan, Iran, in 1979.
He received the B.Sc. degree from Ferdowsi Uni
versity of Mashhad, Mashad, Iran, in 2002, and the
M.Sc. and Ph.D. degrees from Tarbiat Modares Uni
versity, Tehran, Iran, in 2004 and 2010, respectively,
all in electrical engineering.
He is currently a Research Fellow at Nanyang
Technological University, Singapore. His current
research interests include acoustic signal processing
and their applications.
Andy W. H. Khong (M’06) received the B.Eng. de
gree from Nanyang Technological University, Singa
pore, in 2002 and the Ph.D. degree from the Depart
ment of Electrical and Electronic Engineering, Im
perial College London, London, U.K., in 2005. His
Ph.D. research was mainly on partialupdate and se
lectivetap adaptive algorithms with applications to
mono and multichannel acoustic echo cancellation
for handsfree telephony.
He is currently an Assistant Professor in the
School of Electrical and Electronic Engineering,
Nanyang Technological University, Singapore. Prior to that, he served as
a Research Associate in the Department of Electrical and Electronic En
gineering, Imperial College London, from 2005 to 2008. His postdoctoral
research involved the development of signal processing algorithms for vehicle
destination inference as well as the design and implementation of acoustic
array and seismic fusion algorithms for perimeter security systems. He has also
published works on acoustic blind channel identification and equalization for
speech dereverberation. His other research interests include humancomputer
interfaces, source localization, speech enhancement, and blind deconvolution.
Mojtaba Lotfizad was born in Tehran, Iran, in 1955.
He received the B.S. degree in electrical engineering
from AmirKabir University of Technology, Tehran,
in 1980, and the M.S. and Ph.D. degrees from the
UniversityofWales,Cardiff,U.K.,in1985and1988,
respectively.
He joined the Department of Electrical and
Computer Engineering, Tarbiat Modares University,
Tehran, Iran. He has also been a Consultant to sev
eral industrial and governmental organizations. His
current research interests are in signal processing,
adaptive filtering, speech processing, and specialized processors.
View other sources
Hide other sources
 Available from Mojtaba Lotfizad · May 29, 2014
 Available from ntu.edu.sg
 Available from edu.sg