1826IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
A Clipping-Based Selective-Tap Adaptive
Filtering Approach to Stereophonic
Acoustic Echo Cancellation
Mehdi Bekrani, Andy W. H. Khong, Member, IEEE, and Mojtaba Lotfizad
Abstract—Stereophonic acoustic echo cancellation remains one
of the challenging areas for tele/video-conferencing applications.
However, the existence of high interchannel coherence between the
two input signals for such systems leads to considerable degrada-
tion in misalignment convergence of the adaptive filters. We pro-
pose a new algorithm for improving the convergence performance
and steady-state misalignment by considering robustness to the
source position in the transmission room. We achieve this by ex-
tive filtering as well as employing a variable clipping threshold
for the unselected taps. Simulation results using colored noise and
speech signals show an improvement overexisting algorithms both
Index Terms—Center clipping, convergence rate, interchannel
coherence, partial updating, selective-tap, stereophonic acoustic
echo cancellation (SAEC).
ferencing, home entertainment, and E-learning applications
in order to achieve better perception of sound , . Such
systems have become increasingly popular since a stereophonic
audio system provides spatial information, leading to better
perception of the transmitted speech as well as improving the
ambience of the transmission room. These systems mitigate,
to a certain extent, the cocktail party problem that exists in
a multiparty conferencing scenario . One of the problems
that should be addressed in such systems is the cancellation
of stereophonic acoustic echo using a pair of adaptive filters.
Stereophonic acoustic echo cancellation (SAEC) has issues
that are considerably more challenging to overcome than the
HERE has been increasing interest in employing stereo-
phonic audio communication systems for video/tele-con-
Manuscript received October 09, 2009; revised June 18, 2010; accepted
November 29, 2010. Date of publication February 10, 2011; date of current
version June 03, 2011. This work was supported in part by the Singapore
National Research Foundation Interactive Digital Media R&D Program under
research grant NRF2008IDM-IDM004-010 and in part by the Research Insti-
tute for ICT. The associate editor coordinating the review of this manuscript
and approving it for publication was Prof. Sharon Gannot.
the School of Electrical and Electronic Engineering, Nanyang Technological
University, Singapore 639798 (e-mail: firstname.lastname@example.org).
A. W. H. Khong is with Nanyang Technological University, Nanyang Tech-
nological University, Singapore 639798 (e-mail: email@example.com).
M. Lotfizad is with the Department of Electrical and Computer Engineering,
Tarbiat Modares University, P.O. Box 14115-143, Tehran, Iran (e-mail: lot-
Digital Object Identifier 10.1109/TASL.2010.2102752
monophonic case . The fundamental problem is the poor
mismatch between adaptive filter coefficients and the receiving
room acoustic impulse responses. It has been shown  that, in
a practical scenario, the adaptive filter misalignment converges
poorly, leading to a performance degradation. The misalign-
ment problem is caused by the high interchannel coherence that
exists between the two transmitted stereo signals .
A variety of methods have been proposed to address the mis-
alignment problem. These methods revolve around reducing
theinterchannel coherence betweenthe two transmittedsignals.
One of the first methods involved adding controlled quantities
of independent noise to each input channel , or modulating
them . These two approaches are however not feasible since
distortion can be heard even when the added noise level is very
low . Another approach involves the use of comb filtering
, where frequency components of the left and right channels
are separated in order to reduce the interchannel coherence.
Although this method improves the performance of the adap-
tive filters, it degrades the quality of the stereophonic sound,
especially at lower frequencies.
a preprocessor has been proposed to add a nonlinear (NL) func-
tion of the transmitted signal in each channel to the signal it-
self , . This method employs a half-wave rectifier and is
attractive in terms of improving the misalignment behavior of
the adaptive filters as well as its simplicity in implementation.
Although less distortion was introduced compared to the other
techniques discussed above, this distortion is still found to be
objectionable in some cases for music applications , .
An adaptive nonlinearity control was subsequently proposed to
maintain the desired level of misalignment and to minimize the
audio distortion .
It is apparent by now that algorithms proposed for SAEC
need to decorrelate the transmitted signals without degrading
the quality of the speech signals or destroying the stereophonic
image of the transmission room. In view of this, the use of time-
varying all-pass filtering of the stereophonic signals has been
proposed  with the aim of signal decorrelation while main-
taining the stereophonic perception. The use of psychoacoustic
properties to reduce perceived distortion while achieving signal
decorrelation has also been proposed, including the use of spec-
trally shaped random noise , , gain controlled phase dis-
tortion  and the combination of comb filtering and all-pass
filtering with respect to the masking effect . These methods
exploit a perceptual property of the human auditory system,
called “noise masking.”
1558-7916/$26.00 © 2010 IEEE
BEKRANI et al.: CLIPPING-BASED SELECTIVE-TAP ADAPTIVE FILTERING APPROACH TO SAEC1827
More recent advances in SAEC research involve a decorre-
lation procedure for the adaptive weight update , –.
tive filters as opposed to decorrelating the transmitted signals.
tion. This algorithm achieves update signal decorrelation by en-
suring that only an exclusive set of filter coefficients from each
channel is selected for adaptation. In order to reduce any degra-
dation inconvergence performancedue tothis subselectionpro-
cedure, the XM tap-selection strategy further ensures that the
energies of these exclusive tap inputs are maximized. The XM
tap selection has been incorporated with the NL half-wave rec-
tifier and the resulting XMNL normalized least-mean-square
(XMNL-NLMS) algorithm has been shown to achieve a higher
rate ofmisalignment convergencecompared tothat ofnonlinear
NLMS (NL-NLMS) .
We propose to further improve the convergence performance
of the XM tap-selection algorithm. This motivation is derived
from the degradation in convergence performance of the XM
tap-selection algorithm when the interchannel coherence be-
for example, when the source in the transmission room is lo-
cated away from the centroid of the stereophonic microphone
pair. We note that the robustness issue of XMNL-NLMS to
the source position has not been investigated and that, in this
work, we present insight into this problem. Utilizing this new
knowledge, we propose to improve the misalignment conver-
gence of XMNL-NLMS by employing a center-clipping algo-
rithm so that the low interchannel coherence and maximization
of tap-input energy criteria can be jointly optimized. The pro-
steady-state misalignment of the adaptive filters will be robust
to the source position in the transmission room.
In  and references therein, the authors evaluated the adap-
tive signal decorrelation filter as a preprocessor and reported
that complete decorrelation in the frequency domain cannot be
achieved unless one or both of the stereophonic signals are zero
at every frequency. This process is undesirable since it destroys
the stereophonic image of the transmitted signals, which is im-
portant to the listeners in the receiving room. Therefore, they
concluded that complete decorrelation is not applicable in prac-
tice. Our proposed method, as opposed to the technique dis-
cussed in , does not apply decorrelation filtering to the trans-
mitted signals. We instead operate on the tap-input vector of
the adaptive filters. Therefore, similar to XMNL-NLMS, the
stereophonic image is preserved. We also note that the decor-
related tap-input vectors of both algorithms may prevent some
of the adaptive filter coefficients from adaptation in some iter-
ations. However, this effect will not significantly degrade the
terchannel coherence due to our center-clipping approach will
bring about an improvement in convergence rate. It is also im-
SAEC problem, while our proposed center-clipping approach
to the source position.
Fig. 1. Stereophonic acoustic echo cancellation for teleconferencing applica-
II. REVIEW OF STEREOPHONIC ACOUSTIC ECHO
Fig. 1 shows the stereophonic acoustic echo canceller. For
simplicity, we consider only one microphone in the receiving
room, since similar analysis can be applied to the other channel
. Microphones in the transmission room receive signals pro-
duced by the sound source
for Channel andis the length of the transmission room im-
pulse responses, while
is defined as the transpose operator.
The transmitted signal to the receiving room for Channel
then be expressed as
via acoustic impulse responses
signals produce an echo
in the receiving room given by
th channel receiving room impulse response and
.th channel tap-input vector while
Similar to single-channel AEC, adaptive filters are employed
and. In this paper, we assume, similar
to that of , , that the adaptive filters are each of length
istic applications where the adaptive filters are shorter than that
, residual echo will be transmitted back to the transmis-
sion room due to the unmodeled “tails” of
The error between the echo and its estimate can then be ex-
is the length of
the vector of adaptive filter coefficients for the th channel.
,1, 2, is
A. Nonlinear Normalized Least-Mean-Square Algorithm
In order to efficiently reduce
, adaptive algorithms are
1828IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
and is popular because of its simplicity in computational effi-
ciency and ease of implementation. The two-channel NLMS al-
gorithm is expressed as
tap-input vector and filter coefficient vector, respectively, while
is the step-size, which controls the rate of convergence, and
is a regularization parameter to prevent division by zero.
Unlike single-channel AEC, estimation of
for the stereophonic case is challenging. As shown in , when
, the adaptive filters coefficients is of the form
and are linearly related to
The dependency of
tions due to
and is known as the non-uniqueness
problem. In practical cases where
plication of the standard adaptive filtering is normally not suc-
rise to an ill-conditioned system identification problem. As a
result, the misalignment convergence of the adaptive filters is
impaired significantly. This degradation is known as the mis-
alignment problem .
In order to address the misalignment problem, a nonlinear
(NL) preprocessor is proposed , . This preprocessor op-
and such that the modified transmitted
and are given by
1, 2 are vectors given
is appended with is a scalar
on these multiple solu-
, the non-uniqueness
are highly correlated, giving
been shown in  that a value of
between speech quality and misalignment convergence of the
Due toitssimplicity inimplementationandthelowdistortion
introduced, the NL preprocessor has become an intrinsic part of
SAEC and has been incorporated into several recently proposed
algorithms for SAEC ,. Fortheremainder of thispaper,
the NLMS algorithm employing this NL preprocessing will be
referred to as NL-NLMS.
controls the amount of nonlinearity to be added. It has
is a good compromise
B. Exclusive-Maximum (XM) Tap-Selection Algorithm
The XMNL-NLMS update  can be expressed as
1,2 is the channel index,
Fig. 2. Locations of the source and the microphone pair in the transmission
processed tap-input vector of the
by (6) and (7),
th channel defined
is atap-selection matrix and
denote the elemental indices of
, respectively, andand,
As can be seen, the XM algorithm  incorporates a tap-se-
lection scheme that reduces the interchannel coherence by se-
lecting exclusivefilter coefficients for updating in each channel.
It is important to note that the selected tap inputs are only used
for updating the coefficients and hence no distortion is intro-
duced. However, with any tap-selection updating strategy, con-
vergence performance of the adaptive filters will be reduced.
This degradation is then minimized by jointly maximizing the
of the NL preprocessor is required to provide a solution to the
ill-posed SAEC problem, while the XM approach improves the
convergence rate of NL-NLMS. As a result of this combination,
which we refer to as XMNL, better misalignment convergence
of the adaptive filter can be achieved. Alternatively, the XM tap
selection can be seen as an effective approach to achieve good
misalignment convergence with lower distortion brought about
by a smaller nonlinearity factor
III. EFFECT OF SOURCE POSITION ON MISALIGNMENT
One of the problems that has yet been considered for
XMNL-NLMS is its robustness to the position of the source
in the transmission room. We now illustrate the misalignment
convergence of XMNL-NLMS by considering different source
positions. Fig. 2 shows an experimental setup where two mi-
crophones are placed at
(3,2,1.5) m in a room with dimensions 7 m
position of the source starting from the front of the
positions (2.7,2,1.5) m and
7 m4 m. We
BEKRANI et al.: CLIPPING-BASED SELECTIVE-TAP ADAPTIVE FILTERING APPROACH TO SAEC1829
Fig. 3. Misalignment convergence of the XMNL-NLMS algorithm (solid) for
different positions of the source, as compared to NL-NLMS (dashed).
array centroid at
1.5) m for the microphone and (2.85, 1.8, 1.6) m and (2.4, 1.1,
1.7) m for the two loudspeakers. For the purpose of illustration,
all room impulse responses are generated synthetically using
the method of images  such that
samples and these synthetic impulse responses have lengths
that correspond to their reverberation times. At a sampling
Hz, this corresponds to 23 ms and 36 ms,
respectively. A stationary colored noise source signal
obtained by filtering white Gaussian noise through a low-pass
finite impulse response (FIR) filter with coefficients given
 which was chosen to generate a
speech-like spectrum. The convergence of the algorithms is
quantified by the normalized misalignment
m to the front of the right micro-
m. The positions in the receiving room are (3, 2,
dent trials, for the different source positions. The convergence
performance of NL-NLMS for
in front of the microphone pair centroid, has been included for
comparison. Additional tests conducted have shown that the
misalignment convergence of NL-NLMS for various source po-
sitions is comparable to that shown in Fig. 3. A step-size of
was used for XMNL-NLMS, while the step-size of
NL-NLMS was adjusted to
alignment reaches that of XMNL-NLMS. As shown in Fig. 3,
XMNL-NLMS outperforms the full-update NL-NLMS when
the source is directly in front of the microphone pair centroid at
m. On the contrary, the convergence rate of XMNL-
NLMS is reduced significantly when the source is located away
from the microphone pair centroid such as when
To gain further insight into the degradation in convergence
rate of XMNL-NLMS with respect to the source position, we
consider both the interchannel coherence as well as the ratio of
selected tap-input energy to the total tap-input energy.
. Fig. 3 shows the misalign-
m, when it is directly
so that its steady-state mis-
Fig. 4. Average interchannel coherence of the NL-NLMS (dashed) and
XMNL-NLMS (solid) algorithms for various positions of the source.
A. Effect of XM Tap Selection on Interchannel Coherence
We first investigate the effect of XM tap-selection on the in-
terchannel coherence for various source positions. We denote
the XM subselected tap-input vector in (8) as
The interchannel coherence between
and is then
is the cross power spectrum between
is the normalized frequency. For the same condi-
tion as in Figs. 3 and 4 shows the mean interchannel coherence
and , across different frequencies for var-
ious source positions, obtained by averaging over ten indepen-
As can be seen, when the source is in front of the microphone
m , the XM tap-selection criterion is
giving a low interchannel coherence of 0.43. Due to this effi-
cient decorrelation, a good misalignment convergence shown in
degradation due to tap selection does not significantly outweigh
the benefits brought about by the reduction in interchannel co-
herence due to the exclusivity criterion. Fig. 5(a) shows an ex-
ample of the XM selected taps
clarity, we show only the first 80 samples of
samples.As can be seen, mostofthe se-
lected taps in the first channel correspond to elements in
being greater than zero, whereas for the second channel, most
of the active taps correspond to elements in
which increases the magnitude of the positive elements in the
first channel and the negative elements in the second channel.
As a result of XM tap selection on
channel coherence of 0.43 is achieved as shown in Fig. 4.
and employing XM tap selection increases from
in interchannel coherence even after XM tap selection is ap-
and in this case. For
and, low inter-
1830IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 5. Selected tap inputs ? ??? and ? ??? for the XMNL-NLMS algorithm
forvarioussourceposition?. (a)? ? ????m.(b)? ? ????m.(c)? ? ????m.
As can be seen, the effect of NL preprocessing on XM tap se-
lection reduces and the similarity between
creases. The increase in interchannel coherence between
and in turn reduces the convergence rate for XMNL-
NLMS, as can be seen from Fig. 3.
Fig. 5(c) illustrates
front of the right microphone at
being further away from the left microphone, elements in
have magnitudes much lower than those of
it is expected that
difference, the interchannel coherence is lower for
crease when the source moves from
On the contrary, however, the XMNL-NLMS convergence rate
continues to reduce with increasing
from Fig. 3. In Section IV, we gain better insight into this con-
tradictory behavior by studying the effect of XM tap selection
on the energies of the active tap inputs for different source po-
when the source is in
m. Due to the source
. In addition,
. As a result of this
position, as can be seen
B. Effect of XM Tap Selection on Tap-Input Energies
In order to further illustrate how XM tap selection affects the
energies of the tap inputs, we employ the
-ratio criterion 
lection, with elements defined by (9) and (10). Our intention is
not to show the exactrelationship between
ment convergence rate, but to illustrate that the loss of tap-input
energy has an undesirable effect on the convergence rate of the
adaptive filters. Fig. 6 illustrates how
for the XM tap se-
varies with source po-
Fig. 6. Variation of ? against source position for NL-NLMS (dashed) and
(14). As can be seen, NL-NLMS has
sitions since all tap-inputs are used for weight update. On the
other hand, for XMNL-NLMS,
position of the source.
We can now see from Figs. 4 and 6 why XMNL-NLMS
achieves poorer convergence performance when the source
is far from the centroid of the microphone pair: although the
interchannel coherence is relatively low, the
sufficiently high to reduce the degradation in convergence rate
due to tap selection. As a consequence of this conflict between
the need to reduce interchannel coherence and maximization
of tap-input energies, the overall result is a reduction in con-
vergence rate of XMNL-NLMS, as can be seen from Fig. 3.
On the other hand, when
vergence rate due to a reduction in
reduction in interchannel coherence, as shown in Figs. 6 and
4, respectively. As a result of this joint effect, good overall
convergence performance can be obtained for XMNL-NLMS,
as can be seen in Fig. 3.
As an additional note, the position of the source affects not
only the misalignment convergence of XMNL-NLMS, but also
its steady-state value. As can be seen from Fig. 3, the steady-
state normalized misalignment is higher than that of NL-NLMS
position since the weight update is performed
using only a fraction of the tap inputs. This causes an additional
error in the weightupdate, resulting in an increase in the steady-
state normalized misalignment.
for all source po-
and increases with
-ratio is not
m, the degradation in con-
is offset by a significant
IV. CENTER-CLIPPING APPROACH TO SAEC
We now propose a center-clipping algorithm that has the
ability to reduce the interchannel coherence according to
the source location in the transmission room. We exploit the
decorrelation properties of XM tap selection, similar to that of
XMNL-NLMS. In addition we propose an error-based com-
pensation technique that addresses the additional steady-state
normalized misalignment resulting from XM tap selection.
A. Center-Clipping Exclusive Maximum Tap Selection
We propose to apply center-clipping to the tap-input vectors
in order to increase the energies of the “inactive” tap inputs
when the interchannel coherence between
relatively low, such as when the source is in front of one of the
microphones. This is to reduce the degradation in misalignment
BEKRANI et al.: CLIPPING-BASED SELECTIVE-TAP ADAPTIVE FILTERING APPROACH TO SAEC1831
Fig. 7. Schematic diagram of the proposed structure.
will be shown in Section IV-D, the clipping threshold is based
on indirect estimation of the similarity between the energies of
and . This similarity reflects how close the source
is to the microphone centroid, which in turn affects the con-
vergence behavior. The proposed approach ensures a soft-op-
timization constraint, which makes it robust to source position.
A schematic diagram of the proposed method is as shown in
The proposed clipping-based XMNL-NLMS algorithm
(cXMNL-NLMS) updates the filter coefficients using
selection matrix with diagonal elements defined in (9) and (10),
matrix is defined by
is defined in (6) and (7), is the XM tap-
not selected by XM. The purpose of the proposed center-clip-
ping strategy is to increase the energies of tap inputs corre-
sponding to the “inactive” (unselected) taps. We achieve this by
as the clipped vector whose elements are computed by (18),
as shown at the bottom of the page, where
amount of clipping for
. In Section V, we discuss how this
clipping threshold can be determined for our SAEC application.
is used to identify the tap-input elements
B. Effect of
on Tap-Input Energy
The range of
bounded between zero and the maximum magnitude of any el-
ement within that vector, i.e.,
for each tap-input vectorcan be
Fig. 8. Variation of ? and ? against ? ?????
selection algorithm and center-clipping algorithm, respectively.
??? for the XM tap-
It can be seen from (18) that when, we have
, which results in the second term of (16)
having values equivalent to the unselected tap inputs, so that
and the proposed cXMNL-NLMS algorithm be-
comes the full-update NL-NLMS algorithm. On the other hand,
, we have
second term of (16) to vanish, hence reducing cXMNL-NLMS
In order to illustrate how the clipping threshold
the energies of the tap-input vectors
-ratio criterion similar to the one defined by (14)
, causing the
, we em-and
where the subscript
Fig. 8 illustrates how
using signals generated by convolving the previously defined
colored speech-like noise sequence with
the source position is at (2.85,1.85,1.6) m. If
and therefore the convergence behavior of
the center-clipped cXMNL-NLMS algorithm will be equivalent
to NL-NLMS. On the other hand, when
and hence the performance of the proposed
in denotes center-clipped signals.
C. Effect of Clipping Threshold on Interchannel Coherence
As illustrated in Section III, the misalignment conver-
gence of XMNL-NLMS depends on both the tap-input
energy as well as the interchannel coherence between
and . As such, we investigate the effect of
the interchannel coherence between
convolve the same colored Gaussian noise sequence with
and , where
sitioned at coordinates (2.85,1.85,1.6) m. A total of ten
, with the source po-
1832IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 9. Interchannel coherence versus clipping threshold for colored noise
Fig. 10. Variation of ???? against horizontal position ? of the source in the
transmission room for four cases of vertical positions ?.
independent trials are averaged and Fig. 9 illustrates how the
mean interchannel coherence across frequency varies with
, where we constrain
the interchannel coherence reduces with increasing values of
clipping threshold. This is as expected since with increasing
, less energy will be allocated to the unselected taps
brought about by the XM tap-selection criterion, thus reducing
the similarity between
quence causes a reduction in interchannel coherence.
. As can be observed,
, which as a conse-
D. Soft-Decision Rule for Clipping Threshold
As shown in Figs. 8 and 9, a high value of
and interchannel coherence. As described in
Section III, a reduction in interchannel coherence is crucial
when the source position is near the centroid of the microphone
array pair, while the need to increase
when the source is nearer to one microphone. Hence,
enables a tradeoff between interchannel decorrelation and
degradation of misalignment convergence due to tap selection.
is desirable when the source is near the
centroid of the microphone pair, while low
when thesource is near one of themicrophones. As was pointed
out, the similarity between the energies of
contributes to the high convergence rate of XMNL-NLMS
when the source is near the microphone pair centroid. On the
other hand, considerable difference in the energies as well as
low interchannel coherence between
convergence rate of the XMNL-NLMS to reduce significantly.
Fig. 11. Clipping threshold ? ??? versus ????.
We therefore propose to use the difference between absolute
values of the two channels as a measure of energy dissimilarity.
It is foreseeable that when the source is near the microphone
array centroid, the relative absolute values of
are approximately equal. On the contrary, when the source is
nearer to one microphone, the relative absolute values of these
tap-input vectors differ from each other. Thus, the dissimilarity
measure is defined as
solute values of the input elements given by
,1, 2, is a moving average of the ab-
Here, we use
of instantaneous changes of
threshold. We see that
to 0 when the received energy of the two microphones are ap-
proximately equal and approaches 1 when the received energy
from one microphone is much larger than that of the other mi-
Fig. 10 shows four illustrative examples of how
room. In these examples, the room dimensions and the coordi-
nates of microphones are given as shown in Fig. 2. As expected,
the value of
is small when the source is close to the cen-
m. On the contrary,
source is near Microphone 2 at
when the source is nearer to one microphone and vice versa,
a piecewise linear mapping
to smooth so as to avoid the effects
on the value of the clipping
and thatis close
is large when the
The relation between
speech signals, we use values
were determined empirically.
We notefrom (23) that
each channel, and indirectly depends on the relative position of
the source and the microphones. In addition, when
such as when the source is near the microphone pair centroid,
increases with reducing
the second term in (16) and as a result, the proposed algorithm
converges in the same manner as XMNL-NLMS. On the other
and is plotted in Fig. 11. For
is independently determinedfor
. This reduces the effect of
BEKRANI et al.: CLIPPING-BASED SELECTIVE-TAP ADAPTIVE FILTERING APPROACH TO SAEC1833
Fig. 12. Comparison of ? and ?
for NL-NLMS, XMNL-NLMS, and
of the microphones,
which in turn reduces the degradation in misalignment conver-
gence due to XM tap selection in situations where reducing the
interchannel coherence cannot further improve the convergence
rate of the adaptive algorithm.
Fig. 12 further illustrates how degradation in
consequently the convergence performance due to tap se-
lection, can be reduced by incorporating
proposed cXMNL-NLMS algorithm. As described earlier in
Section III-B, the degradation of convergence performance
for XMNL-NLMS when the source is far away from the mi-
crophone pair centroid
compared to NL-NLMS. In this scenario, the proposed
cXMNL-NLMS algorithm ensures that
value 1 achieved by NL-NLMS. On the other hand, when the
source is near the centroid
tains the beneficial properties of the XM tap-selection strategy
to maximally decorrelate the tap-input vectors. The overall
joint result is a fast converging cXMNL-NLMS that is robust
to the source position in the transmission room.
increases, such as when the source is near one
reduces to zero. As shown in Fig. 8,
is due to a reduction of
is closer to the
m , cXMNL-NLMS at-
V. ENHANCEMENT OF THE STEADY-STATE PERFORMANCE
As noted from Fig. 3 and Section III, the steady-state nor-
malized misalignment of XMNL-NLMS is higher than that of
NL-NLMS. This is due to the unselected filter coefficients in-
troducing an error during adaptation since there is now a mis-
, which drives the unknown system, and
the selective tap-input vector
regardless of the source position. It is therefore expected that
cXMNL-NLMS also suffers from increased steady-state mis-
that is different from
To illustrate this, we consider input vectors
(17) each of length
. A colored noise source signal
generated as described in Section III, is positioned at coordi-
nates (2.89,1.85,1.6) m. As before, the room is of dimension
as shown in Fig. 2. This steady-state normalized misalignment
is achieved by allowing the algorithm to reach its steady-state
and averaging over the last 5000 samples. Fig. 13 shows how
the steady-state normalized misalignment varies with normal-
ized clipping thresholds
. This error occurs
Fig. 13. Relation between steady-state misalignment and normalized clipping
threshold in NL-NLMS, XMNL-NLMS, and cXMNL-NLMS, when the source
is at (2.89,1.85,1.6) m.
As can be seen, the steady-state normalized misalignment for
29 dB, and 24 dB for XMNL-NLMS. If we
defined by (23) for the above source position,
NLMS algorithm gives an additional 1 dB of steady-state nor-
malized misalignment improvement over XMNL-NLMS.
As a final improvement, we propose to enhance the
cXMNL-NLMS. We note that the steady-state performance of
cXMNL-NLMS depends on the source position and therefore
, we need to reduce the additional steady-state
normalized misalignment. Hence, we propose to reduce
to zero after convergence of the mean-square error (MSE).
To estimate the convergence of the algorithm, we employ the
following recursive relation for estimating the MSE 
, so the proposed cXMNL-
process. We consider
In this case, the normalized misalignment reduces towards the
steady-state misalignment of NL-NLMS. We therefore propose
into (23) giving (25),
for our experiments. Hence,
achieve a low level of MSE after convergence. Fig. 14 shows
an example of MSE and misalignment convergence for a single
trial when the source is at coordinates (2.87,1.85,1.6) m. As can
be seen, setting
based on (23) brings about a higher initial
convergence rate than NL-NLMS, while reducing
using (25) after MSE convergence will bring about additional
reduction in steady-state normalized misalignment. Additional
tests revealed that although the convergence of MSE occurs
before convergence of misalignment, exact knowledge of MSE
is not required. The proposed cXMNL-NLMS algorithm is
summarized in Table I.
is an empirically-derived lower limit that aims to
1834IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011
Fig. 14. Estimated MSE and misalignment for the cXMNL-NLMS algorithm.
THE cXMNL-NLMS ALGORITHM
VI. FURTHER SIMULATION AND EXPERIMENTAL RESULTS
We evaluate, by way of further simulation, the performance
of cXMNL-NLMS under different source positions. In order
to simulate the SAEC system, impulse responses
, , and were generated using the method
of images . To evaluate the robustness of the algorithms,
SPECIFICATIONS OF THE SIMULATED ENVIRONMENT IN SAEC
we fixed the location of the microphones while the source
position in the transmission room was varied across three
cases shown in Table II. A sampling rate of
was used throughout the experiment. The source signal was
generated by filtering a white Gaussian noise signal through a
low-pass finite impulse response (FIR) filter with coefficients
, as was used in Section III.
We compare the convergence performance of the pro-
XMNL-NLMS. Since the steady-state normalized misalign-
ment for XMNL-NLMS varies with the source position, we
chose its step-size so that its steady-state normalized mis-
alignment reaches that of NL-NLMS and cXMNL-NLMS
when the source position is in front of the microphone array
centroid at (2.85,1.85,1.6) m. This corresponds to
for both NL-NLMS and cXMNL-NLMS and
XMNL-NLMS. White Gaussian noise (WGN) is added to
to achieve an dB. For all simulations, we have used
dB for cXMNL-NLMS. The normalized misalign-
ment curves, obtained by averaging over ten independent trials,
are plotted for Cases 1, 2, and 3 (Table II) in Fig. 15(a)–(c)
Fig. 15(a) shows the convergence performance of the algo-
rithms where the source is directly in front of the right micro-
phone. In this case,
and hence a high value of
As shown in Fig. 11, this translates to a low
sequently, as shown in (18),
the convergence performance of cXMNL-NLMS is equivalent
to that of NL-NLMS. The proposed cXMNL-NLMS algorithm
thus achieves an initial convergence of nearly 8 dB better than
XMNL-NLMS and reaches a steady-state normalized misalign-
ment of 4 dB lower as expected.
Fig. 15(b) shows convergence results when the source posi-
tion is mid-way between the microphone pair centroid and the
right microphone at (2.88,1.85,1.6) m. Now, the interchannel
coherence increases relative to the previous case and as can be
seen from this result, cXMNL-NLMS achieves the highest rate
are significantly different
defined in (21) is expected.
, and con-
. As a result,
BEKRANI et al.: CLIPPING-BASED SELECTIVE-TAP ADAPTIVE FILTERING APPROACH TO SAEC1835
Fig. 15. Normalized misalignment of the NL-NLMS, XMNL-NLMS, and
in front of right microphone at (3,1.9,1.55) m. (b) Source at (2.88,1.85,1.6) m.
(c) Source in the center of microphone pair at (2.85,1.85,1.6) m.
of initial convergence, improving that of NL-NLMS by nearly 4
dB during initial convergence. We note that when compared to
XMNL-NLMS, cXMNL-NLMS achieves approximately 3 dB
improvement during initial convergence and about 2 dB lower
steady-state normalized misalignment.
Finally, when the source position is in front of the micro-
phone pair centroid at coordinates (2.85,1.85,1.6) m,
and is high. As can be seen from Fig. 15(c), the conver-
gence of cXMNL-NLMS achieves the highest rate of conver-
gence with an improvement of approximately 4 dB over that
of XMNL-NLMS and nearly 10 dB over that of NL-NLMS. In
terms of steady-state normalized misalignment, the NL-NLMS
To further illustrate the convergence performance of the
proposed cXMNL-NLMS algorithm, we simulated the SAEC
system using a speech signal as shown in Fig. 16. In this ex-
ample, the speech signal is sampled at 11025 Hz and a WGN is
to achieve an SNR
source in the transmission room is (2.88,1.85,1.6) m. As can be
seen from this result, cXMNL-NLMS achieves approximately
6 dB lower misalignment than NL-NLMS and 4 dB lower than
XMNL-NLMS during initial convergence.
We consider using recorded impulse responses, where the di-
mensions of the transmission room is 6.5 m
the source was positioned at (3.25,4.37,1.15) m while the
two microphones were placed at (3.11,2.37,1.2) m and
(3.39,2.37,1.2) m for Case 1 and at (3.25,2.37,1.2) m and
(2.83,2.37,1.2) m for Case 2. The estimated reverberation time
was 280 ms. These impulse responses were of length 3087
samples and subsequently truncated to 512 samples. Fig. 17
dB. The position of the
8.75 m 2.65 m,
Fig. 16. Normalized misalignment for the NL-NLMS, XMNL-NLMS, and
cXMNL-NLMS algorithms when the source is at (2.88,1.85,1.6) m for a speech
Fig. 17. Illustration of measured transmission room impulse response ? .
Fig. 18. Normalized misalignment of the NL-NLMS, XMNL-NLMS, and
cXMNL-NLMS algorithms for real room impulse responses. (a) Case 1: source
in the center of microphone pair. (b) Case 2: source approximately in front of
the right microphone.
shows one of the measured impulse responses in the trans-
mission room. For this experiment, the sampling frequency,
step-sizes as well as the SNRs were the same as that of the
previous simulations. The results are shown in Fig. 18. As
can be seen from Fig. 18(a), cXMNL-NLMS achieves nearly
3 dB improvement in convergence performance compared to
XMNL-NLMS when the source is in front of the microphone
centroid. In Fig. 18(b), when the source is in front of the right
microphone, the proposed algorithm achieves nearly 6 dB
improvement in convergence compared to XMNL-NLMS.
We presented a new approach to improve the misalignment
convergence as well as the steady-state performance and ro-
bustness of adaptive filters for SAEC. This approach retains
the decorrelation properties of the XM selective-tap algorithm
1836IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST 2011 Download full-text
when the source is located near the microphone centroid, but
employs a variable center-clipping threshold whose value is de-
rived based on the absolute values of the received microphone
signals in order to work better, when the source is located closer
to one of the microphones. The proposed approach achieves
better convergence performance for different source positions
in comparison to both NL-NLMS and XMNL-NLMS.
tion Systems, Y. Huang and J. Benesty, Eds..
 J. Benesty, M. M. Sondhi, and Y. Huang, Handbook of Speech Pro-
cessing. Secaucus, NJ: Springer-Verlag, 2008.
 J. Benesty, T. Gänsler, D. R. Morgan, M. M. Sondhi, and S. L. Gay,
Advances in Network and Acoustic Echo Cancellation.
 J. Benesty, D. R. Morgan, and M. M. Sondhi, “A better understanding
and an improved solution to the specific problems of stereophonic
acoustic echo cancellation,” IEEE Trans. Speech Audio Process., vol.
6, no. 2, pp. 156–165, Mar. 1998.
 M. M. Sondhi and D. R. Morgan, “Acoustic echo cancellation for
stereophonic teleconferencing,” in Proc. IEEE Workshop Applicat.
Signal Process. Audio Acoust., 1991, pp. 141–142.
 S. Shimauchi and S. Makino, “Stereo projection echo canceller with
true echo path estimation,” in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Process., 1995, pp. 3059–3062.
 M. M. Sondhi, D. R. Morgan, and J. L. Hall, “Stereophonic acoustic
echo cancellation-An overview of the fundamental problem,” IEEE
Signal Process. Lett., vol. 2, no. 8, pp. 148–151, Aug. 1995.
 J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, “Stereophonic
acoustic echo cancellation using nonlinear transformations and comb
filtering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,
1998, pp. 3673–3676.
 J. Benesty, D. R. Morgan, J. L. Hall, and M. M. Sondhi, “Synthesized
stereo combined with acoustic echo cancellation for desktop confer-
encing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,
1999, pp. 148–158.
 K. Mayyas, “Stereophonic acoustic echo cancellation using lattice or-
thogonalization,” IEEE Trans. Speech Audio Process., vol. 10, no. 7,
pp. 517–525, Oct. 2002.
Trans. Speech Audio Process., vol. 10, no. 5, pp. 257–267, Jul. 2002.
 M. Ali, “Stereophonic acoustic echo cancellation system using time-
varying all-pass filtering for signal decorrelation,” in Proc. IEEE Int.
Conf. Acoust., Speech, Signal Process., 1998, pp. 3689–3692.
varying all-pass filters using spectral-shaped noise for signal decorre-
lation in stereophonic acoustic echo cancellation,” in Proc. Int. Conf.
Digital Signal Process., 2002, pp. 87–92.
 J. Herre, H. Buchner, and W. Kellermann, “Acoustic echo cancellation
ment,”in Proc. IEEEInt. Conf.Acoust., Speech, SignalProcess., 2007,
 J. M. Valin, “Perceptually-motivated nonlinear channel decorrelation
for stereo acoustic echo cancellation,” in Proc. Hands-Free Speech
Commun. Microphone Arrays (HSCMA), 2008, pp. 188–191.
lation algorithm using adaptive update on the basis of enhanced input-
signal vector,” Signal Process., vol. 86, pp. 1157–1167, Jun. 2006.
 A. W. H. Khong and P. A. Naylor, “Stereophonic acoustic echo can-
cellation employing selective-tap adaptive algorithms,” IEEE Trans.
Speech Audio Process, vol. 14, no. 3, pp. 785–796, May 2006.
Norwell, MA: Kluwer,
 M. Bekrani, A. W. H. Khong, and M. Lotfizad, “Neural network based
adaptive echo cancellation for stereophonic teleconferencing applica-
tion,” in Proc. Int. Conf. Multimedia Expo, 2010, pp. 1172–1177.
 M. Bekrani, M. Lotfizad, and A. W. H. Khong, “An efficient quasi
LMS/newton adaptive algorithm for stereophonic acoustic echo can-
cellation,” in Proc. IEEE Asia Pacific Conf. Circuits Syst., 2010.
 S. Haykin, Adaptive Filter Theory.
 D.R.Morgan,J.L. Hall,andJ.Benesty,“Investigationofseveraltypes
of nonlinearities for use in stereo acoustic echo cancellation,” IEEE
Trans. Speech Audio Process., vol. 9, no. 6, pp. 686–696, Sep. 2001.
 J. B. Allen and D. A. Berkley, “Image method for efficiently simu-
lating small-room acoustics,” J. Acoust. Soc. Amer., vol. 65, no. 4, pp.
943–950, Apr. 1979.
 S. Attallah, “The wavelet transform-domain LMS adaptive filter with
press Briefs, vol. 53, no. 1, pp. 8–12, Jan. 2006.
 K. Mayyas, “New transform-domain adaptive algorithms for acoustic
Englewood Cliffs, NJ: Prentice-
Mehdi Bekrani was born in Gorgan, Iran, in 1979.
He received the B.Sc. degree from Ferdowsi Uni-
versity of Mashhad, Mashad, Iran, in 2002, and the
M.Sc. and Ph.D. degrees from Tarbiat Modares Uni-
versity, Tehran, Iran, in 2004 and 2010, respectively,
all in electrical engineering.
He is currently a Research Fellow at Nanyang
Technological University, Singapore. His current
research interests include acoustic signal processing
and their applications.
Andy W. H. Khong (M’06) received the B.Eng. de-
gree from Nanyang Technological University, Singa-
pore, in 2002 and the Ph.D. degree from the Depart-
ment of Electrical and Electronic Engineering, Im-
perial College London, London, U.K., in 2005. His
Ph.D. research was mainly on partial-update and se-
lective-tap adaptive algorithms with applications to
mono- and multi-channel acoustic echo cancellation
for hands-free telephony.
He is currently an Assistant Professor in the
School of Electrical and Electronic Engineering,
Nanyang Technological University, Singapore. Prior to that, he served as
a Research Associate in the Department of Electrical and Electronic En-
gineering, Imperial College London, from 2005 to 2008. His postdoctoral
research involved the development of signal processing algorithms for vehicle
destination inference as well as the design and implementation of acoustic
array and seismic fusion algorithms for perimeter security systems. He has also
published works on acoustic blind channel identification and equalization for
speech dereverberation. His other research interests include human-computer
interfaces, source localization, speech enhancement, and blind deconvolution.
Mojtaba Lotfizad was born in Tehran, Iran, in 1955.
He received the B.S. degree in electrical engineering
from AmirKabir University of Technology, Tehran,
in 1980, and the M.S. and Ph.D. degrees from the
He joined the Department of Electrical and
Computer Engineering, Tarbiat Modares University,
Tehran, Iran. He has also been a Consultant to sev-
eral industrial and governmental organizations. His
current research interests are in signal processing,
adaptive filtering, speech processing, and specialized processors.