Content uploaded by Titouan Parcollet
Author content
All content in this area was uploaded by Titouan Parcollet on Sep 11, 2017
Content may be subject to copyright.
Content uploaded by Titouan Parcollet
Author content
All content in this area was uploaded by Titouan Parcollet on Jun 19, 2017
Content may be subject to copyright.
Quaternion Denoising Encoder-Decoder
for Theme Identification of Telephone Conversations
Titouan Parcollet, Mohamed Morchid, Georges Linar`
es
LIA, University of Avignon (France)
{firstname.lastname}@univ-avignon.fr
email@address
Abstract
In the last decades, encoder-decoders or autoencoders (AE)
have received a great interest from researchers due to their capa-
bility to construct robust representations of documents in a low
dimensional subspace. Nonetheless, autoencoders reveal little
in way of spoken document internal structure by only consid-
ering words or topics contained in the document as an isolate
basic element, and tend to overfit with small corpus of docu-
ments. Therefore, Quaternion Multi-layer Perceptrons (QMLP)
have been introduced to capture such internal latent depen-
dencies, whereas denoising autoencoders (DAE) are composed
with different stochastic noises to better process small set of
documents. This paper presents a novel autoencoder based on
both hitherto-proposed DAE (to manage small corpus) and the
QMLP (to consider internal latent structures) called “Quater-
nion denoising encoder-decoder” (QDAE). Moreover, the paper
defines an original angular Gaussian noise adapted to the speci-
ficity of hyper-complex algebra. The experiments, conduced on
a theme identification task of spoken dialogues from the DE-
CODA framework, show that the QDAE obtains the promising
gains of 3% and 1.5% compared to the standard real valued de-
noising autoencoder and the QMLP respectively.
Index Terms: Spoken language understanding, Neural net-
works, Quaternion algebra, Denoising encoder-decoder neural
networks
1. Introduction
A basic encoder-decoder neural network [1] (AE) consists of
two neural networks (NN): an encoder that maps an input vec-
tor into a low-dimensional and fixed context vector; a decoder
that generates a target vector by reconstructing this context vec-
tor. Multidimensional data such as latent structures of spo-
ken dialogue are difficult to capture by traditional autoencoders
due to the unidimensionality of real numbers employed. [2],
[3] have introduced a quaternions-based multilayer perceptron
(QMLP) as well as a specific spoken dialogues segmentation to
better capture internal structures as a result of the Hamilton dot
product [4], and thus achieve better accuracies than real-valued
multilayer perceptrons (MLP), on a theme identification task of
spoken dialogues. A quaternion encoder-decoder has then been
proposed by [5] to take advantage of the multidimensionality
of hyper-complex numbers to code existing latents relations be-
tween pixel colors. However, both quaternions and real num-
bers based autoencoders suffer from overfitting and degraded
generalization capabilities when dealing with small corpus of
documents [6]. Indeed, autoencoders try to map the initial vec-
tor in a low-dimensional subspace and are thus highly corre-
lated with the number of patterns to learn. To overcome this
drawback, a stochastic encoder-decoder called denoising auto-
encoders (DAE) have been proposed by [6] and investigated in
[7, 8, 9]. Intuitively, a denoising auto-encoder encodes artifi-
cially corrupted inputs, and try to reconstruct the initial vector.
By learning this noisy representation, DAE tends to better ab-
stract patterns in a reduced robust subspace.
The paper proposes a novel quaternion denoising encoder-
decoder (QDAE) that takes into account the internal document
structure (such as the QMLP) and is able to manage small cor-
pus (as DAE). Nonetheless, traditional noises, such as additive
isotropic Gaussian noise [10] , are elaborated for real-numbers
autoencoders. Therefore, we also propose a Gaussian angu-
lar noise (GAN) adapted to the quaternion algebra. The ex-
periments on the DECODA telephone conversations framework
show the impact of the different noises, alongside to underline
the performance of the proposed QDAE over DAE, AE, MLP
and QMLP.
The rest of the paper is organized as follows: Section 2 presents
the quaternion encoder-decoder and Section 3 details the exper-
imental protocol. The results are discussed in Section 4 before
concluding on Section 5.
2. Quaternion Denoising Encoder-Decoder
The proposed QDAE is a denoising autoencoder with quater-
nion numbers. Section 2.1 details the quaternion properties re-
quired for the QAE, and QDAE algorithms are presented in Sec-
tion 2.2.
2.1. Quaternion algebra
Quaternion algebra Qis an extension of complex numbers de-
fined in a four dimensional space as a linear combination of
four basis elements denoted as 1,i,j,kto represent a rotation.
A quaternion Qis written as Q=r1+xi+yj+zk. In a
quaternion, ris the real part while xi+yj+zkis the imaginary
part (I) or the vector part. A set of basic quaternion properties
needed for the QDAE definition are defined as follow:
•all products of i,j,k:i2=j2=k2=ijk =1
•quaternion conjugate Q⇤of Qis:
Q⇤=r1xiyjzk
•inner product between two quaternions Q1and Q2is
hQ1,Q
2i=r1r2+x1x2+y1y2+u1u2
•normalized of a quaternion Q/=Q
pr2+x2+y2+z2
•rotation through the angle of quaternion R/:
Q0=R/QR/⇤
•Hamilton product ⌦between Q1and Q2encodes latent de-
pendencies and is defined as follows:
Q1⌦Q2=(r1r2x1x2y1y2z1z2)+
(r1x2+x1r2+y1z2z1y2)i+
(r1y2x1z2+y1r2+z1x2)j+
(r1z2+x1y2y1x2+z1r2)k
Q1⌦Q2performs an interpolation between two rotations fol-
lowing a geodesic over a sphere in the R3space. More about
hyper-complex numbers can be found in [4, 11, 12] and about
quaternion algebra in [13].
2.2. Quaternion Autoencoder (QAE)
The QAE is a three-layered neural network made of an encoder
and a decoder (see Figure 1-(a)). The well known autoencoder
(AE) is obtained with the same algorithm but with real numbers.
hn
Qp˜
Qp
hn
Qcorrupted
pf(Qp)Qp˜
Qp
(a) Quaternion autoencoder (b) Quaternion denoising autoencoder
Figure 1: Illustration of the Quaternion autoencoders.
Given a set of Pnormalized inputs Q/
p(referenced as Qpfor
convenience) (1pP) of size M, the encoder computes
an hidden representation hnof Qp={Qm}M
m=1 (Nis the
number of hidden units):
hn=↵(
M
X
m=1
w(1)
nm ⌦Qm+✓(1)
n)
where w(1) is a N⇥Mweight matrix and ✓(1) is a N-
dimensional bias vector; ↵(Q)is the sigmoid activation func-
tion of the quaternion Q[14] ↵(Q)=sig(r)1 + sig(x)i+
sig(y)j+sig(z)k, with sig (.)= 1
1+e.. The decoder attempts
to reconstruct the input vector Qpfrom the hidden vector hnto
obtain the output vector ˜
Qp=n˜
QmoM
m=1:
˜
Qm=↵(
N
X
n=1
w(2)
mn ⌦hn+✓(2)
m)
where the reconstructed quaternion vector ˜
Qpis M- dimen-
sional, w(2) is a M⇥Nweight matrix and ✓(2) is a
M-dimensional bias vector. During learning, the QAE at-
tempts to reduce the reconstruction error ebetween ˜
Qpand
Qpby using the traditional Mean Square Error (MSE) [15]
(eMSE(˜
Qm,Q
m)=|| ˜
QmQm||2) for minimizing the to-
tal reconstruction error LMSE =1
PP
p2PP
m2M
eMSE(˜
Qm,Q
m)
with respect to the parameters (quaternions) set =
{w(1),✓
(1),w
(2),✓
(2)}.
2.3. Quaternion Denoising Autoencoder (QDAE)
Traditional autoencoders fail to: 1) separate robust features and
relevant information to residual noise [9] from small corpus; 2)
take into account the temporal and internal structures of spo-
ken documents. Therefore, denoising autoencoders (DAE) [9]
corrupt inputs using specific noises during the encoding and
decode this representation to reconstruct the non-corrupted in-
puts. DAE models learn a robust generative model to bet-
ter represent small sized corpus of documents; [2] propose
to learn internal and temporal structure representation with
a quaternion multilayer perceptron (QMLP). The paper pro-
poses to address issues related to small sized corpus (such
as DAE) and to temporal structure (QMLP) by introducing
a quaternion denoising autoencoder called QDAE. Figure 1-
(b) shows an input vector Qpartificially corrupted by a noise
function f() applied to each index Qmof Qpas f(Qp)=
{f(Q1),...,f(Qm),...,f(QM)}.
Standard real-numbers-adapted noises :
•Additive isotropic Gaussian (G): Adds a different Gaussian
noise to each input values (Q1,...,Q
m,...,Q
M)of a fixed
proportion of patterns Qpwith means and variances of the
Gaussian distribution bounded by the corresponding average
of all the patterns of the same prediction theme of Qp.
•Salt-and-pepper (SP): fixes amount of patterns of all patterns
Qprandomly set to 1or 0.
•Dropout(D): fixes amount of patterns of all patterns Qpran-
domly set to 0.
Given a noise function f() the corresponding corrupted quater-
nion of Qm=r1+xi+yj+zkis Qcorrupted
m=f(Qm)=
f(r)1 + f(x)i+f(y)j+f(z)k. Nonetheless, such a represen-
tation does not take into account the specificity of quaternion
algebra since they were designed for real numbers. Indeed, a
quaternion represents a rotation over the R3space. Therefore,
basic additive and non-angular noises such as a Gaussian noise,
only represents a one dimensional translation and does not take
advantage of the rotation defined by a quaternion.
Quaternion Gaussian Angular Noise (GAN):
The GAN takes advantage of the quaternion algebra (rotation)
and is proposed to address the drawback of a weakly adapted
noise function (add a noise to each quaternion) to the rotation
definition of quaternions. The GAN noise function is based on
the rotation of a quaternion vector Qparound an axis defined
in a cone centered in mtand delimited by vt; where mtis the
mean and vtis the variance of the patterns Qpbelonging to
theme t. Let Rt
pbe a Gaussian noised Quaternion for the theme
tdefined as:
Rt
p=mt+N(0,I)vt.(1)
The Gaussian angular noise function f() rotates Qpbelonging
to the theme taround Rt
pto obtain the corrupted Quaternion
Qcorrupted
p:
f(Qp)= Rt
p⌦Qp⌦Rt⇤
p
|Rt
p⌦Qp|and (2)
f(Qp)=(Qp,if Rt
p=Qp
Qcorrupted
p,otherwise (3)
It is worth noticing in eq.(3) that fis idempotent since Rt
p=
Qpto maintain the dialogue pattern unaltered.
3. Experimental protocol
The effectiveness of the proposed QDAE-GAN is evaluated dur-
ing a theme identification task of telephone conversations from
the DECODA corpus detailed in Section 3.1. Section 3.2 ex-
presses the dialogue features employed as inputs of autoen-
coders as well as the configurations of each neural network.
3.1. Spoken Dialogue dataset
The DECODA corpus [16] contains human-human telephone
real-life conversations collected in the CSS of the Paris trans-
portation system (RATP). It is composed of 1,242 telephone
conversations, corresponding to about 74 hours of signal, split
into a train (740 dialogues), a development (dev - 175 dia-
logues) and a test set (327 dialogues). Each conversations is
annotated with one of 8 themes. Themes correspond to cus-
tomer problems or inquiries about itinerary, lost and found, time
schedules, transportation cards, state of the traffic, fares, fines
and special offers. The LIA-Speeral Automatic Speech Recog-
nition (ASR) system [17] is used for automatically transcribing
each conversation. Acoustic model parameters are estimated
from 150 hours of telephone speech. The vocabulary contains
5,782 words. A 3-gram language model (LM) is obtained by
adapting a basic LM with the training set transcriptions. Auto-
matic transcriptins are obtained with word error rates (WERs)
of 33.8%,45.2% and 49.%on the train, dev. and test sets re-
spectively. These high rates are mainly due to speech disflu-
encies in casual users and to adverse acoustic environments in
metro stations and streets.
3.2. Input features and Neural Networks settings
The experiments compare our proposed QDAE with DAE
based on real-numbers [7] and to the QMLP[2].
Input features: [2] show that a LDA [18] space with 25 topics
and a specific user-agent document segmentation involving the
quaternion Q=r1+xi+yj+zkto be build with the user
part of the dialogue in the first complex value x, the agent in
yand the topic prior of the whole dialogue on z, achieve the
best results on 10 folds with the QMLP. Therefore, we keep
this segmentation and concatenate the 10 representations of
size 25 in a single input vector of size M=250. Indeed, the
compression of 10 folds in a single input vector gives to DAEs
more features for generalizing patterns. For fair comparison, a
QMLP with the same input vector is tested.
QDAE and QMLP configurations: The appropriate size
of the hidden layer hfor the QDAE have to be chosen by
varying the number of neurons of the hidden layer to change
the amount and the shape of features given to the classifier.
Different autoencoders have thus been learned by fluctuating
the hidden layer size from 10 to 120. Finally a QMLP classifier
is trained with 8hidden neurons; the hidden layer of the QAE,
QDAE as the input vectors; and 8 outputs neurons (8 themes t
on the DECODA corpus).
4. Experiments and Results
The proposed Quaternion denoising autoencoder (QDAE) is
compared to the quaternion autoencoder (QAE) in Section 4.1,
throughout a theme identification task of telephone conversa-
tions described in Section 3.1. For fair comparison, the QDAE
is then compared to the real-valued AE and MLP in Section 4.2.
4.1. QDAE with additive and angular noises
Figure 2 shows the accuracies obtained with the denoising
quaternion encoder-decoder for the development and the test set
during the theme identification task of telephone conversations
of DECODA project.
The first remark is that the results obtained on the development
dataset reported in Fig.2 are similar whatever the model em-
10 20 50 60 80 100 120
65
70
75
80
85
90
(a) SEG 1 on Dev
Mean = 81.8
Mean = 81.3
Mean = 82.4
QDAE-GAN
QDAE-G
QDAE-D
QDAE-SP
QAE
20 50 60 80 100 120
(b) SEG 1 on Test
Mean = 74.8
Mean = 74.6
Mean = 75.4
QDAE-GAN
QDAE-G
QDAE-D
QDAE-SP
QAE
Figure 2: Accuracies in %obtained on the development (left)
and test (right) set by varying the number of neurons in the hid-
den layer of the QAE and QDAE.
ployed. Nonetheless, the proposed QDAE-GAN gives better,
and more robust to hidden layer size variation, results on the test
dataset than any other methods. Table 1 validates the results ob-
served for the QDAE-GAN with a gain of more than 3.5% and
2.5% for QDAE-G and QDAE-D respectively. As expected, tra-
ditional noises give worse results compared to the adapted noise
due to the specificities of the quaternion algebra. Indeed, an ad-
ditive real-based Gaussian noise applied to a quaternion does
not take advantage of rotations defining quaternions. It is worth
underlying the bad performances reported with the QDAE-SP
and QDAE-D, which are not based on real or quaternion al-
gebra specificities: These poor performances are explained by
the high impact of zero values propagated during the Hamil-
ton product (see Section 2.1) by increasing the number of dead
neurons through the neural network. Finally, the non-corrupted
QAE gives a good ”best test” value on the test dataset (83%)
regarding the other QDAE, proving the non-relevance of real-
based noises to quaternion-based autoencoders.
Models Dev. Best Test Real Test
QAE 89.1 83.0 80.9
QDAE-SP 88.5 82.5 81.2
QDAE-G 88.5 83.1 81.5
QDAE-D 89.1 83.0 82.5
QDAE-GAN 90.2 85.2 85.2
Table 1: Accuracies in % obtained by proposed quaternion
encoder-decoders on the DECODA dataset
4.2. QDAE vs. real-valued neural networks
For a fair comparison this original QDAE-GAN approach is
compared to real-valued autoencoders and traditional neural
networks, and the results are depicted in Table 2.
Table 2 shows that non-adapted noise and standard QAE give
worse performances than a QMLP because of the lack of unseen
compressed information they give to the classifier. It is worth
emphasizing that the best accuracies observed are obtained by
the QDAE-GAN representing a gain of 11% regarding DAE [7].
The results depicted on Table 2 demonstrate the global improve-
ment of performances of the quaternion-valued neural networks
compared to the real-valued ones. Indeed, QMLP also gives a
Models Type Dev. Best Test Real Test Impr.
MLP[2] R85.2 79.6 79.6 -
QMLP Q89.7 83.7 83.7 +4.1
AE[7] R- - 81 -
QAE Q89.1 83.0 80.9 -0.1
DAE[7] R- - 74.3 -
DSAE[7] R88.0 83.0 82.0 +7.7
QDAE-GAN Q90.2 85.2 85.2 +10.9
Table 2: Summary of accuracies in % obtained by different
neural networks on the DECODA famework.
important gain of more than 4% regarding the MLP; QDAE-
GAN obtains a gain of 3.2% compared to DSAE.
5. Conclusion
Summary. This paper proposes a promising denoising encoder-
decoder based on the quaternion algebra coupled with an orig-
inal and well-adapted quaternion Gaussian angular noise. The
initial intuition that the QDAE better captures latent relations
between input features and can generalize from small cor-
pus, has been demonstrated. It has been shown that ongoing
noises during learning must be adapted to the quaternion alge-
bra to give better results and truly expose the full potential of
quaternion neural networks. Moreover, this paper shows that
quaternion-valued neural networks always perform better than
real-valued ones achieving impressive accuracies on the small
DECODA corpus with less input features and less neural pa-
rameters.
Limitations and Future Work. Document segmentation is a
crucial issue when it comes to better capture latent, temporal
and spacial information and thus needs more investigation to
expose the potential of quaternion-based models. Moreover, the
lack of GPU tools to manage quaternions impline a massive im-
plementation time to deal with bigger spoken document corpus.
A future work is to investigate other quaternion adapted noises,
and other quaternion based neural networks which better take
into consideration the document internal structure , such as re-
current neural networks and Long Short Term Memory neural
networks.
6. References
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, 2015.
[2] T. Parcollet, M. Morchid, P.-M. Bousquet, R. Dufour, G. Linar`
es,
and R. De Mori, “Quaternion neural networks for spoken lan-
guage understanding,” in Spoken Language Technology Workshop
(SLT), 2016 IEEE. IEEE, 2016, pp. 362–368.
[3] M. Morchid, G. Linar`
es, M. El-Beze, and R. De Mori, “Theme
identification in telephone service conversations using quater-
nions of speech features,” in Interspeech. ISCA, 2013.
[4] I. Kantor, A. Solodovnikov, and A. Shenitzer, Hypercomplex num-
bers: an elementary introduction to algebras. Springer-Verlag,
1989.
[5] T. Isokawa, N. Matsui, and H. Nishimura, “Quaternionic neural
networks: Fundamental properties and applications,” Complex-
Valued Neural Networks: Utilizing High-Dimensional Parame-
ters, pp. 411–439, 2009.
[6] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Ex-
tracting and composing robust features with denoising autoen-
coders,” in Proceedings of the 25th international conference on
Machine learning. ACM, 2008, pp. 1096–1103.
[7] K. Janod, M. Morchid, R. Dufour, G. Linares, and R. De Mori,
“Deep stacked autoencoders for spoken language understanding,”
Matrix, vol. 1, p. 2, 2016.
[8] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement
based on deep denoising autoencoder.” in Interspeech, 2013, pp.
436–440.
[9] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Man-
zagol, “Stacked denoising autoencoders: Learning useful repre-
sentations in a deep network with a local denoising criterion,”
Journal of Machine Learning Research, vol. 11, no. Dec, pp.
3371–3408, 2010.
[10] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimension-
ality of data with neural networks,” Science, vol. 313, no. 5786,
pp. 504–507, 2006.
[11] J. B. Kuipers, Quaternions and rotation sequences. Princeton
university press Princeton, NJ, USA:, 1999.
[12] F. Zhang, “Quaternions and matrices of quaternions,” Linear al-
gebra and its applications, vol. 251, pp. 21–57, 1997.
[13] J. Ward, Quaternions and Cayley numbers: Algebra and applica-
tions. Springer, 1997, vol. 403.
[14] P. Arena, L. Fortuna, G. Muscato, and M. G. Xibilia, “Multilayer
perceptrons to approximate quaternion valued functions,” Neural
Networks, vol. 10, no. 2, pp. 335–342, 1997.
[15] Y. Bengio, “Learning deep architectures for ai,” Foundations and
trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[16] F. Bechet, B. Maza, N. Bigouroux, T. Bazillon, M. El-Beze,
R. De Mori, and E. Arbillot, “Decoda: a call-centre human-human
spoken conversation corpus.” in LREC, 2012, pp. 1343–1347.
[17] G. Linares, P. Noc´
era, D. Massonie, and D. Matrouf, “The lia
speech recognition system: from 10xrt to 1xrt,” in Text, Speech
and Dialogue. Springer, 2007, pp. 302–308.
[18] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet alloca-
tion,” the Journal of machine Learning research, vol. 3, pp. 993–
1022, 2003.