Conference PaperPDF Available

Abstract

Deep Neural Networks (DNN) received a great interest from researchers due to their capability to construct robust abstract representations of heterogeneous documents in a latent subspace. Nonetheless, mere real-valued deep neural networks require an appropriate adaptation, such as the con-volution process, to capture latent relations between input features. Moreover, real-valued deep neural networks reveal little in way of document internal dependencies, by only considering words or topics contained in the document as an isolate basic element. Quaternion-valued multi-layer per-ceptrons (QMLP), and autoencoders (QAE) have been introduced to capture such latent dependencies, alongside to represent multidimensional data. Nonetheless, a three-layered neural network does not benefit from the high abstraction capability of DNNs. The paper proposes first to extend the hyper-complex algebra to deep neural networks (QDNN) and, then, introduces pre-trained deep quaternion neural networks (QDNN-AE) with dedicated quaternion encoder-decoders (QAE). The experiments conduced on a theme identification task of spoken dialogues from the DECODA data set show, inter alia, that the QDNN-AE reaches a promising gain of 2.2% compared to the standard real-valued DNN-AE. Index Terms— Quaternions, deep neural networks, spoken language understanding, autoencoders, machine learning.
DEEP QUATERNION NEURAL NETWORKS
FOR SPOKEN LANGUAGE UNDERSTANDING
Titouan Parcollet1,2, Mohamed Morchid1, Georges Linar`
es1
1LIA, University of Avignon (France)
2Orkis (France)
{firstname.lastname}@univ-avignon.fr
ABSTRACT
Deep Neural Networks (DNN) received a great interest
from researchers due to their capability to construct robust
abstract representations of heterogeneous documents in a la-
tent subspace. Nonetheless, mere real-valued deep neural
networks require an appropriate adaptation, such as the con-
volution process, to capture latent relations between input
features. Moreover, real-valued deep neural networks reveal
little in way of document internal dependencies, by only
considering words or topics contained in the document as
an isolate basic element. Quaternion-valued multi-layer per-
ceptrons (QMLP), and autoencoders (QAE) have been intro-
duced to capture such latent dependencies, alongside to rep-
resent multidimensional data. Nonetheless, a three-layered
neural network does not benefit from the high abstraction
capability of DNNs. The paper proposes first to extend the
hyper-complex algebra to deep neural networks (QDNN) and,
then, introduces pre-trained deep quaternion neural networks
(QDNN-AE) with dedicated quaternion encoder-decoders
(QAE). The experiments conduced on a theme identification
task of spoken dialogues from the DECODA data set show,
inter alia, that the QDNN-AE reaches a promising gain of
2.2% compared to the standard real-valued DNN-AE.
Index TermsQuaternions, deep neural networks, spo-
ken language understanding, autoencoders, machine learning.
1. INTRODUCTION
Deep neural networks (DNN) have become ubiquitous in a
broad spectrum of domains specific applications, such as im-
age processing [1, 2], speech recognition [3], or spoken lan-
guage understanding (SLU) [4]. State-of-the art approaches
involve different neural-based structures to construct abstract
representations of documents in a low dimensional subspace,
such as deep neural networks [5], recurrent neural networks
(RNN)[6, 7, 8, 9], convolutional neural networks (CNN)[1],
and, more recently, generative adversarial neural networks
(GAN)[10]. However, in a standard real-valued neural struc-
ture, the latent relations between input features are difficult
to represent. Indeed, multidimensional features require to be
reduced to a one dimensional vector before the learning pro-
cess, while an appropriate solution is to process a multidimen-
sional input as a single homogeneous entity. In other words,
real-valued representations reveal little in way of document
internal structure by only considering words or topics con-
tained in the document as an isolate basic element. Therefore,
quaternion multi-layer perceptrons (QMLP) [11, 12, 13] and
quaternion autoencoders (QAE) [14] have been introduced to
capture such latent dependencies, thanks to the fourth dimen-
sionality of hyper-complex numbers alongside to the Hamil-
ton product [15]. Nonetheless, previous quaternion-based
studies focused on three-layered neural networks, while the
efficiency and the effectiveness of DNN have already been
demonstrated [16, 5].
Therefore, this paper proposes first to extend QMLPs to deep
quaternion neural networks (QDNN) for theme identifica-
tion of telephone conversations. Indeed the high abstraction
capability of DNNs added to the quaternion latent relation
representations, fully expose the potential of hyper-complex
based neural structures. Nevertheless, in [17], the authors
highlighted the non-local optimum convergence, and the high
overfitting probability of training deep neural networks. To
alleviate these weaknesses, different methods have been pro-
posed, such as adding noises during learning to prevent the
overfitting phenomenon[18], or a pre-training process to eas-
ily converge to a non-local optimum [19], with a Restricted
Boltzman Machine (RBM) [20] or an encoder-decoder neural
network (AE) [21].
The paper proposes then to compare the randomly initial-
ized QDNN with a greedy layer-wise pre-trained QDNN
using QAE, called “QDNN-AE”, to fully expose the quater-
nion deep neural structure capabilities during a SLU task.
The experiments are conduced on the DECODA telephone
conversations framework and show a promising gain of the
QDNN compared to the QMLP. Moreover, the experiments
underline the impact of pre-training with a dedicated autoen-
coder for a QDNN. Finally the proposed quaternion based
models are compared to the real-valued ones.
The rest of the paper is organized as follows: Section 2
presents the quaternion deep neural networks and quater-
nion encoder-decoders and Section 3 details the experimental
protocol. The results are discussed in Section 4 before con-
cluding on Section 5
2. DEEP QUATERNION NEURAL NETWORKS
(QDNN) AND QUATERNION AUTOENCODERS
(QAE)
The proposed QDNN combines the well-known real-valued
deep neural network 1with the Quaternion algebra. Sec-
tion 2.1 details the quaternion fundamental properties re-
quired to define and understand the QDNN algorithms pre-
sented in Section 2.2.
2.1. Quaternion algebra
The quaternion algebra Qis an extension of the complex
numbers defined in a four dimensional space as a linear
combination of four basis elements denoted as 1,i,j,kto
represent a rotation. A quaternion Qis written as:
Q=r1 + xi+yj+zk(1)
In a quaternion, ris its real part while xi+yj+zkis the
imaginary part (I) or the vector part. There is a set of basic
Quaternion properties needed for the QDNN definition:
all products of i,j,k
i2=j2=k2=ijk =1
quaternion conjugate Qof Qis: Q=r1xiyjzk
dot product between two quaternions Q1and Q2is
hQ1, Q2i=r1r2+x1x2+y1y2+z1z2
quaternion norm: |Q|=pr2+x2+y2+z2
normalized quaternion Q/=Q
|Q|
Hamilton product between Q1and Q2encodes latent
dependencies and is defined as follows:
Q1Q2=(r1r2x1x2y1y2z1z2)+
(r1x2+x1r2+y1z2z1y2)i+
(r1y2x1z2+y1r2+z1x2)j+
(r1z2+x1y2y1x2+z1r2)k(2)
This encoding capability has been confirmed by [22].
Indeed, the authors have demonstrated the rotation,
transformation and scaling capabilities of a single
quaternion due to the Hamilton product. Moreover,
it performs an interpolation between two rotations fol-
lowing a geodesic over a sphere in the R3space.
1A multilayer perceptron (MLP) with more than one hidden layer
Given a segmentation S={s1, s2, s3, s4}of a document
pPdepending on the document segmentation detailed
in [12] and a set of topics from a latent Dirichlet allocation
(LDA) [23] z={z1, . . . , zi, . . . , zT}, each topic ziin a doc-
ument dis represented by the quaternion:
Qp(zi) = xdp1(zi)1 + x2
p(zi)i+x3
p(zi)j+x4
p(zi)k(3)
where xm
p(zi)is the prior of the topic ziin segment sm
of a document pas described in [12]. This quaternion is then
normalized to obtain the input Q/
p(zi)of QMLPs.
More about hyper-complex numbers can be found in [24,
25, 26] and more precisely about quaternion algebra in [27].
2.2. Deep Quaternion Neural Networks (QDNN)
This section details the QDNN algorithms and structure (Fig-
ure 1). QDNN differs from the real-valued DNN in each
learning subprocess, and all elements of the QDNN (inputs
Qp, labels t, weights w, biases b, outputs γ, . . . ) are quater-
nions:
t=outputs =lM
l3
l2
Qp=inputs =l1
Fig. 1. Illustration of a Quaternion Deep Neural Network with
2hidden layers (M= 4).
Activation function
The activation function βis the split [11] ReLU function (α),
applied to each element of the quaternion Q=r1 + xi+yj+
zkas follows:
β(x) = α(r)1 + α(x)i+α(y)j+α(z)k(4)
Where αis
α(x) = Max(0, x)(5)
Forward phase
Let Nlbe the number of neurons contained in the layer l(1
lL) and Lbe the number of layers of the QDNN including
the input and the output layers. θl
nis the bias of the neuron n
(1nNl) from the layer l. Given a set of Pnormalized
quaternion input patterns Q/
p(1pP, denoted as Qp
for convenience in the rest of the paper) and a set of labels
tpassociated to each input Qp, the output γl
n(γ0
n=Qn
pand
γM
n=tn) of the neuron nfrom the layer lis given by:
γl
n=β(Sl
n)
with Sl
n=
Nl1
X
m=0
wl
nm γl1
m+θl
n(6)
Learning phase
The error eobserved between the expected outcome yand
the result of the forward phase γMis then evaluated for the
output layer (l=M) as follows:
el
n=tnγl
n(7)
and for the hidden layer (2l < M 1)
el
n=
Nl+1
X
h=1
wl+1
h,n δl+1
h,(8)
The gradient δis computed with
δl
n=el
n×∂β(Sl
n)
∂Sl
n
where ∂β(Sl
n)
∂Sl
n
=(1,if Sl
n>0
0,otherwise (9)
Update phase
The weights wl
n,m and the bias values θl
nhave to be respec-
tively updated to bwl
n,m and b
θl
n:
bwl
n,m =wl
n,m +δl
nβ?(Sl
n)(10)
b
θl
n=θl
n+δl
n.(11)
2.3. Quaternion Autoencoder (QAE)
The QAE [14] is a three-layered (M= 3) neural network
made of an encoder and a decoder where N1=N3as de-
picted in Figure 2. The well-known autoencoder (AE) is ob-
tained with the same algorithm than the QAE, but with real
numbers, and the Hamilton product is replaced with the mere
dot product.
Given a set of Pnormalized inputs Qp(1pP), the
encoder computes an hidden representation l2of Qp, while
the decoder attempts to reconstruct the input vector Qpfrom
this hidden vector from l2to obtain the output vector ˜
Qp. The
learning phase follows the algorithm previously described in
Section 2.2. Indeed, the QAE attempts to reduce the recon-
struction error eMSE between ˜
Qpand Qpby using the tradi-
tional Mean Square Error (MSE) [19] between all m(1
mN1) quaternions Qmand estimated ˜
Qmcomposing the
pattern Qp:
eMSE(˜
Qm, Qm) = || ˜
QmQm||2(12)
to minimize the total reconstruction error LMSE
LMSE =1
PX
pPX
mM
eMSE(˜
Qm, Qm).(13)
˜
Qp=outputs =l3
l2
Qp=inputs =l1
Fig. 2. Illustration of a quaternion autoencoder.
2.4. QDNN initialized with dedicated QAEs
Deep neural networks learning process is impacted by a wide
variety of issues related to the large number of parameters [3],
such as the vanishing or exploding gradient, and the over-
fitting phenomenon. Different techniques have been proposed
to address thes drawbacks [18, 28, 29, 30]: additive noises,
normalization preprocessing, adaptive learning rates, and
pre-training. The pre-training process allows the neural net-
work structure to converge faster using a pre-learning phase
in an unsupervised task, to a non-local optimum. Indeed,
an autoencoder is employed for learning the weight matrices
composing the QDNN, except the last one that is randomly
initialized, as illustrated in Figure 3-(b)-(c). Therefore, the
auto-encoded neural networks (DNN-AE, QDNN-AE) are
able to map effectively the initial input features in an homo-
geneous subspace, learned during an unsupervised training
process with dedicated encoder-decoder neural networks.
t=outputs =lM
l3
l2
Qp=inputs =l1
˜
Qp
l2
Qp
˜
l2
l3
l2
(a) (b) (c)
Fig. 3. Illustration of a pre-trained Quaternion Deep Neu-
ral Network (a) based on 2dedicated Quaternion encoder-
decoders (b-c).
3. EXPERIMENTAL PROTOCOL
The efficiency and the effectiveness of the proposed QDNN
and QDNN-AE are evaluated during a spoken language un-
derstanding task of theme identification of telephone conver-
sations described in Section 3.1. The conversations data set is
from, the DECODA framework detailed in Section 3.2. Sec-
tion 3.3 expresses the dialogue features employed as inputs
of autoencoders as well as the configurations of each neural
network.
3.1. Spoken Language Understanding task
The application considered in this paper, and depicted in Fig-
ure 4, concerns the automatic analysis of telephone conversa-
tions [31] between an agent and a customer in the call center
of the Paris public transport authority (RATP) [32]. The most
important speech analytics for the application are the con-
versation themes. Relying on the ontology provided by the
RATP, we have identified 8 themes related to the main rea-
son of the customer call, such as time schedules, traffic states,
special offers, lost and found,...
A conversation involves a customer, which is calling from
an unconstrained environment (typically from train station or
street, by using a mobile phone) and an agent which is sup-
posed to follow a conversation protocol to address customers
requests or complains. The conversation tends to vary accord-
ing to the model of the agent protocol. This paper describes a
theme identification method that relies on features related to
this underlying structure of agent-customer conversation.
Here, the identification of conversation theme encounters
two main problems. First, speech signals may contain very
noisy segments that are decoded by an Automatic Speech
Recognition (ASR) system. On such difficult environments,
ASR systems frequently fail and the theme identification
component has to deal with high Word Error Rates (WER
'49%).
Second, themes can be quite ambiguous, many speech
acts being theme-independent (and sometimes confusing) due
to the specificities of the applicative context: most of con-
versations evoke traffic details or issues, station names, time
schedules, etc... Moreover, some of the dialogues contain sec-
ondary topics, augmenting the difficulty of dominant theme
identification. On the other hand, dialogues are redundant
and driven by the RATP agents which try to follow, as much
as possible, standard dialogue schemes.
3.2. Spoken dialogue data set
The DECODA corpus [32] contains human-human telephone
real-life conversations collected in the Customer Care Ser-
vice System of the Paris transportation system (RATP). It is
composed of 1,242 telephone conversations, corresponding
to about 74 hours of signal, split into a train (train - 739 di-
alogues), a development (dev - 175 dialogues) and a test set
(test - 327 dialogues). Each conversation is annotated with
one of the 8 themes. Themes correspond to customer prob-
lems or inquiries about itinerary, lost and found, time sched-
ules, transportation cards, state of the traffic, fares, fines and
special offers. The LIA-Speeral Automatic Speech Recogni-
tion (ASR) system [33] is used for automatically transcribing
each conversation. Acoustic model parameters are estimated
from 150 hours of telephone speech. The vocabulary contains
5,782 words. A 3-gram language model (LM) is obtained
by adapting a basic LM with the training set transcriptions.
Automatic transcriptions are obtained with word error rates
(WERs) of 33.8%,45.2% and 49.%on the train, development
and test sets respectively. These high rates are mainly due to
speech disfluencies in casual users and to adverse acoustic en-
vironments in metro stations and streets.
Agent : Bonjour
Client : Bonjour
Agent : Je vous écoute...
Client : J’appelle car j’ai ru
une amende aujourd’hui, mais
ma carte Imagine est toujours
valable pour la zone 1 [...] J’ai
oublié d’utiliser ma carte Navigo
pour la zone 2
Agent : Vous n’avez pas utilisé
votre carte Navigo, ce qui
explique le fait que vous ayez
reçu une amende [...]
Client : Merci au revoir
Agent : Au revoir
Agent
Client
Cartes de
transport
Agent: Hello
Customer: Hello
Agent: Speaking...
Customer: I call you because
I was fined today, but I still
have an Imagine card
suitable for zone 1 [...] I forgot
to use my Navigo card for
zone 2
Agent: You did not use
your Navigo card, that is
why they give you a fine not
for a zone issue [...]
Customer: Thanks, bye
Agent: Bye
Agent
Customer
Transportation
cards
(a) Original dialogue (in French) (b) Translated dialogue (in English)
Fig. 4. Exemple of a dialogue from the DECODA corpus for
the SLU task of theme identification. This dialogue has been
labeled by the agent as “OBJECTS” (Lost & founds objects).
3.3. Input features and Neural Networks configurations
The experiments compare our proposed QDNN, QDNN-
AE with DNN, DNN-AE based on real-numbers and to the
QMLP[12], MLP made of a single hidden layer.
Input features: [12] show that a LDA [23] space with 25
topics and a specific user-agent document segmentation in-
volving the quaternion Q=r1 + xi+yj+zkto be build
with the user part of the dialogue in the first complex value
x, the agent in yand the topic prior of the whole dialogue
on z, achieve the best results on 10 folds with the QMLP.
Therefore, we keep this segmentation and concatenate the
10 representations of size 25 in a single input vector of size
Qp= 250. Indeed, the compression of 10 folds in a single
input vector gives to the QDNNs more features to generalize
patterns. For fair comparison, a QMLP with the same input
vector is tested.
Neural Networks configurations: First of all, the appropri-
ate size of a single layer for both DNN (MLP) and QDNN
(QMLP) have to be investigated by varying the number of
neurons Nbefore extending to multiple layers. Different
QMLP, MLP have thus been learned by fluctuating the hid-
den layer size from 8to 1024. Finally we trained multiple
DNN and QDNN by varying the number of layers from 1
to 5. Indeed, it is not straightforward to investigate all the
possible topologies using 8to 1024 neurons in 1to 5layers.
Therefore, each layer contains the same fixed number of neu-
rons. During the experiments, a dropout rate of 50% is used
for each layer to prevent overfitting.
4. EXPERIMENTAL RESULTS
Section 4.1 details the experiments to find out the “op-
timal” number of hidden neurons Nlwith a real-valued
(MLP) and a quaternion-valued (QMLP) neural networks.
Then, DNN/QDNN and their pre-trained equivalents DNN-
AE/QDNN-AE are compared in Section 4.2. Finally, perfor-
mances of all neural networks models (real- and quaternion-
valued) are depicted in Section 4.3.
4.1. QMLP vs. MLP
Figure 5 shows the different accuracies obtained on the devel-
opment and test data sets, with a real-valued and a quaternion
based neural networks, composed with a single hidden layer
(M= 3). To stick with a realistic case, the optimal neu-
rons number in the hidden layer is chosen with respect to the
results obtained on the development data set, by varying the
number of neurons in the hidden layer. The best accuracies
on the development data set for both MLP and QMLP are
observed with an hidden layer composed with 512 neurons.
Indeed, the QMLP and MLP reach an accuracy of 90.38%
and 91.38% respectively. Moreover, both MLP and QMLP
performances go down since the hidden layer contains more
than 512 neurons.
8 256 512 1024
80
82
84
86
88
90
92
(a) SEG 1 on Dev
Mean = 81.8
Mean = 81.3
Mean = 82.4
QMLP
MLP
8 256 512 1024
(b) SEG 1 on Test
Mean = 74.8
Mean = 74.6
Mean = 75.4
QMLP
MLP
Fig. 5.Accuracies in %obtained on the development (left)
and test (right) data sets by varying the number of neurons in
the hidden layer of the QMLP and MLP respectively.
4.2. Quaternion- and real-valued Deep Neural Networks
This Section details the performances obtained for both DNN,
QDNN, DNN-AE and QDNN-AE for 1,2,3,4and 5layers
composed of 512 neurons.
DNN vs. QDNN randomly initialized. Table 1 and 2 show
the performances obtained for the straightforward real-valued
and the proposed quaternion deep neural networks trained
without autoencoders and learned with a dropout noise [18]
to prevent overfitting.
The “Real Test” accuracies observed for the Test data set
are obtained depending on the best accuracy reached on the
Development data set.
The “Best Test” accuracy is obtained with the best con-
figuration (number of hidden neurons for the MLP/QMLP or
number of hidden layers for the DNN/QDNN) on the Test
data set.
Topology Dev Best Test. Real Test Epochs
2-Layers 91.38 84.92 84.30 609
3-Layers 90.80 84 84 649
4-Layers 86.76 85.23 82.39 413
5-Layers 87.36 80.02 77.36 728
Table 1.Summary of accuracies in % obtained by the DNN
It is worth emphasizing that, as depicted in Table 1, the re-
sults observed for the DNN on the development and test data
sets drastically decrease while the number of layer increases.
This is due to the small size of the training data set (739 doc-
uments). Indeed, there is not enough patterns for the DNN
to construct a high abstract representation of the documents.
Conversely, Table 2 shows that the proposed QDNN achieves
stable performances with a standard deviation of barely 0.6
on the development set while the DNN gives more than 2.0.
Indeed, the DNN accuracies move down from 85% with 2/3/4
hidden layers, to 80% with 5 hidden layers. This can be easily
explained by the random initialization of the large number of
neural parameters, that makes difficult DNNs to converge to
a non-local optimum. Indeed, the Hamilton product of the
QDNN constraint the model to learn the latent relations be-
tween each component. Therefore the best DNN results are
observed with only 2hidden layers with 91.38% and 84.30%,
while the QDNN obtains 92.52% and 85.23% with 4layers
for the development and test data sets respectively. Finally,
the QDNN converged about 6times faster than the DNN with
the same topology (148 epochs for the QDNN and 728 for
the DNN composed with 5 hidden layers for example).
DNN vs. QDNN initialized with dedicated encoder-
decoders. Table 3 and Table 4 expose the obtained pre-
trained QDNN-AE results with dedicated autoencoders (QAE
for the QDNN and AE for the DNN). It is worth underlying
that the numbers of epochs required to converge for the DNN-
Topology Dev Best Test. Real Test Epochs
2-Layers 91.95 86.46 84 140
3-Layers 91.95 85.53 85.23 113
4-Layers 92.52 86.46 85.23 135
5-Layers 90.80 85.84 84 148
Table 2.Summary of accuracies in % obtained by the
QDNN
AE is lower than those for the DNN for all the topologies,
as depicted in Table 1. Moreover, the accuracies reported for
the DNN-AE are more stable and move up with regard to the
number of layers.
Topology Dev Best Test. Real Test Epochs
2-Layers 90.23 84 82.46 326
3-Layers 90.80 84.92 83.69 415
4-Layers 91.52 85.23 84.64 364
5-Layers 91.95 85.23 84.30 411
Table 3.Summary of accuracies in % obtained by the DNN-
AE
The same phenomenon is observed with the QDNN-AE,
but with a smaller gain for the number of epochs alongside
with better reported accuracies. Indeed, the DNN-AE gives
an accuracy of 84.30% on the test data set while the QDNN-
AE obtains an accuracy of 86.46% in real conditions with a
gain of 2.16 points.
Topology Dev Best Test. Real Test Epochs
2-Layers 92.52 86.46 84.61 100
3-Layers 93.57 86.23 85.83 95
4-Layers 92.52 86.46 86.46 88
5-Layers 93.57 86.76 86.46 132
Table 4.Summary of accuracies in % obtained by the
QDNN-AE
Overall, the pre-training process allows each model
(DNN-AE/QDNN-AE) to better perform on a theme iden-
tification task of telephone conversations. Indeed, both
DNN-AE and QDNN-AE need less epochs (and thus less
processing time) and reach better accuracies, due to their
pre-training process based on dedicated encoder-decoders to
converge quickly to an optimal configuration (weight matri-
ces w) during the fine tuning phase.
4.3. QDNN-AE vs. other neural networks
Table 5 sums up the results obtained on the theme identifi-
cation task of telephone conversations from the DECODA
corpus, with different real-valued and quaternion neural net-
works. The first remark is that the proposed QDNN-AE
obtains the best accuracy (86.46%) for both development and
test data sets compared to the real-valued neural networks
(Deep stacked autoencoder DSAE (82%), MLP (83.38%),
DNN (84%) and DNN-AE (84.3%)). Moreover, the QDNN
randomly initialized outperforms also all real-valued neural
networks with an accuracy of 85.23%. We can point out that
each quaternion-based neural networks performs better than
his real-valued equivalent thanks to the Hamilton product
(+2.61% for the QMLP for example). Finally, the QDNN
presents a gain of roughly 1.25% compared to the real-valued
DNN, and the pre-trained QDNN-AE shows an improvement
of 2.10% compared to the DNN-AE.
Models Type Dev. Real Test Epochs Impr.
DSAE[34] R88.0 82.0 - -
MLP R91.38 83.38 499 +1.38
QMLP Q90.38 84.61 381 +2.61
DNN R91.38 84 609 -
QDNN Q92.52 85.23 135 +1.23
DNN-AE R91.95 84.30 411 -
QDNN-AE Q93.57 86.46 132 +2.16
Table 5.Summary of accuracies in % obtained by different
neural networks on the DECODA famework.
5. CONCLUSION
Summary. This paper proposes a promising deep neural net-
work framework, based on the quaternion algebra, coupled
with a well-adapted pre-training process made of quaternion
encoder-decoders. The initial intuition that the QDNN-AE
better captures latent abstract relations between input fea-
tures, and can generalize from small corpus due to the high
dimensionality added by multiple layers, has been demon-
strated. It has been shown that a well-suited pre-training pro-
cess alongside to an increased number of neural parameters,
allow the QDNN-AE to outperform all the previously inves-
tigated models on the DECODA SLU task. Moreover, this
paper shows that quaternion-valued neural networks always
perform better and faster than real-valued ones, achieving
impressive accuracies on the small DECODA corpus with
a small number of input features and, therefore, few neural
parameters.
Limitations and Future Work. The document segmentation
process is a crucial issue when it comes to better capture la-
tent, temporal and spacial informations, and thus needs more
investigation to expose the potential of quaternion-based
models. Moreover, such DNN algorithms are adapted from
real-based ones and do not take into account the entire set
of specificities of the quaternion algebra. Therefore, a future
work will consist in investigate different structures of neural
networks such as recurrent and convolutional, and propose
well-tailored learning algorithms adapted to hyper-complex
numbers (rotations).
6. REFERENCES
[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
ton, “Imagenet classification with deep convolutional
neural networks, in Advances in neural information
processing systems, 2012, pp. 1097–1105.
[2] Dan Ciregan, Ueli Meier, and J ¨
urgen Schmidhuber,
“Multi-column deep neural networks for image classi-
fication,” in Computer Vision and Pattern Recognition
(CVPR), 2012 IEEE Conference on. IEEE, 2012, pp.
3642–3649.
[3] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl,
Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Se-
nior, Vincent Vanhoucke, Patrick Nguyen, Tara N
Sainath, et al., “Deep neural networks for acoustic mod-
eling in speech recognition: The shared views of four
research groups,” Signal Processing Magazine, IEEE,
vol. 29, no. 6, pp. 82–97, 2012.
[4] George E Dahl, Dong Yu, Li Deng, and Alex Acero,
“Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition, IEEE Trans-
actions on Audio, Speech, and Language Processing,
vol. 20, no. 1, pp. 30–42, 2012.
[5] Alex Graves, Abdel-rahman Mohamed, and Geoffrey
Hinton, “Speech recognition with deep recurrent neural
networks, in Acoustics, speech and signal processing
(icassp), 2013 ieee international conference on. IEEE,
2013, pp. 6645–6649.
[6] Tomas Mikolov, Martin Karafi´
at, Lukas Burget, Jan Cer-
nock`
y, and Sanjeev Khudanpur, “Recurrent neural net-
work based language model., in Interspeech, 2010,
vol. 2, p. 3.
[7] Ken-ichi Funahashi and Yuichi Nakamura, Approxi-
mation of dynamical systems by continuous time recur-
rent neural networks, Neural networks, vol. 6, no. 6,
pp. 801–806, 1993.
[8] Felix A Gers, J¨
urgen Schmidhuber, and Fred Cummins,
“Learning to forget: Continual prediction with lstm,”
1999.
[9] Alex Graves and J¨
urgen Schmidhuber, “Framewise
phoneme classification with bidirectional lstm and other
neural network architectures, Neural Networks, vol. 18,
no. 5, pp. 602–610, 2005.
[10] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza,
Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio, “Generative adversar-
ial nets,” in Advances in neural information processing
systems, 2014, pp. 2672–2680.
[11] Paolo Arena, Luigi Fortuna, Giovanni Muscato, and
Maria Gabriella Xibilia, “Multilayer perceptrons to ap-
proximate quaternion valued functions, Neural Net-
works, vol. 10, no. 2, pp. 335–342, 1997.
[12] Titouan Parcollet, Mohamed Morchid, Pierre-Michel
Bousquet, Richard Dufour, Georges Linar`
es, and Re-
nato De Mori, “Quaternion neural networks for spoken
language understanding,” in Spoken Language Technol-
ogy Workshop (SLT), 2016 IEEE. IEEE, 2016, pp. 362–
368.
[13] Mohamed Morchid, Georges Linar`
es, Marc El-Beze,
and Renato De Mori, “Theme identification in telephone
service conversations using quaternions of speech fea-
tures,” in Interspeech. ISCA, 2013.
[14] Teijiro Isokawa, Tomoaki Kusakabe, Nobuyuki Matsui,
and Ferdinand Peper, “Quaternion neural network and
its application,” in Knowledge-based intelligent infor-
mation and engineering systems. Springer, 2003, pp.
318–324.
[15] William Rowan Hamilton, Elements of quaternions,
Longmans, Green, & Company, 1866.
[16] Ronan Collobert and Jason Weston, “A unified archi-
tecture for natural language processing: Deep neural
networks with multitask learning, in Proceedings of
the 25th international conference on Machine learning.
ACM, 2008, pp. 160–167.
[17] Xavier Glorot and Yoshua Bengio, “Understanding the
difficulty of training deep feedforward neural networks,
in International conference on artificial intelligence and
statistics, 2010, pp. 249–256.
[18] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,
Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout:
A simple way to prevent neural networks from overfit-
ting,” The Journal of Machine Learning Research, vol.
15, no. 1, pp. 1929–1958, 2014.
[19] Yoshua Bengio, “Learning deep architectures for ai,”
Foundations and trends R
in Machine Learning, vol. 2,
no. 1, pp. 1–127, 2009.
[20] Ruslan Salakhutdinov and Geoffrey Hinton, “Deep
boltzmann machines,” in Artificial Intelligence and
Statistics, 2009, pp. 448–455.
[21] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo
Larochelle, et al., “Greedy layer-wise training of deep
networks, Advances in neural information processing
systems, vol. 19, pp. 153, 2007.
[22] Toshifumi Minemoto, Teijiro Isokawa, Haruhiko
Nishimura, and Nobuyuki Matsui, “Feed forward neu-
ral network with random quaternionic neurons, Signal
Processing, vol. 136, pp. 59–68, 2017.
[23] David M Blei, Andrew Y Ng, and Michael I Jordan,
“Latent dirichlet allocation,” the Journal of machine
Learning research, vol. 3, pp. 993–1022, 2003.
[24] I.L. Kantor, A.S. Solodovnikov, and A. Shenitzer, Hy-
percomplex numbers: an elementary introduction to al-
gebras, Springer-Verlag, 1989.
[25] Jack B Kuipers, Quaternions and rotation sequences,
Princeton university press Princeton, NJ, USA:, 1999.
[26] Fuzhen Zhang, “Quaternions and matrices of quater-
nions,” Linear algebra and its applications, vol. 251,
pp. 21–57, 1997.
[27] J.P. Ward, Quaternions and Cayley numbers: Algebra
and applications, vol. 403, Springer, 1997.
[28] Diederik Kingma and Jimmy Ba, Adam: A
method for stochastic optimization,” arXiv preprint
arXiv:1412.6980, 2014.
[29] Matthew D Zeiler, “Adadelta: an adaptive learning rate
method,” arXiv preprint arXiv:1212.5701, 2012.
[30] Dumitru Erhan, Yoshua Bengio, Aaron Courville,
Pierre-Antoine Manzagol, Pascal Vincent, and Samy
Bengio, “Why does unsupervised pre-training help deep
learning?,” Journal of Machine Learning Research, vol.
11, no. Feb, pp. 625–660, 2010.
[31] John J Godfrey, Edward C Holliman, and Jane Mc-
Daniel, “Switchboard: Telephone speech corpus for
research and development, in Acoustics, Speech, and
Signal Processing, 1992. ICASSP-92., 1992 IEEE Inter-
national Conference on. IEEE, 1992, vol. 1, pp. 517–
520.
[32] Frederic Bechet, Benjamin Maza, Nicolas Bigouroux,
Thierry Bazillon, Marc El-Beze, Renato De Mori, and
Eric Arbillot, “Decoda: a call-centre human-human
spoken conversation corpus., in LREC, 2012, pp. 1343–
1347.
[33] Georges Linares, Pascal Noc´
era, Dominique Massonie,
and Driss Matrouf, “The lia speech recognition sys-
tem: from 10xrt to 1xrt,” in Text, Speech and Dialogue.
Springer, 2007, pp. 302–308.
[34] Killian Janod, Mohamed Morchid, Richard Dufour,
Georges Linares, and Renato De Mori, “Deep stacked
autoencoders for spoken language understanding, ISCA
INTERSPEECH, vol. 1, pp. 2, 2016.
... Indeed, many works are studying their theoretical and mathematical attributes. Starting from foundational works on complexvalued networks [1], [2], to the most popular subdomain of hypercomplex numbers, i.e., the quaternion domain [3]- [5], and very recent works which study various subdomains and unveil new properties [6]- [8]. These works have shown the unique capacity of these high-dimensional neural networks to model both global and local relations found in multidimensional data [9]. ...
... where h is the input to layer l and the weight matrix W is constructed as a sum of Kronecker products according to (5). ...
... Quaternions are hypercomplex numbers extending the complex domain [30] using a real and three imaginary components as ...
... Many quaternion activation functions have been investigated, whereas the split activation [19,24], a more frequent and simpler solution, is applied in our proposed model. Let be a split activation function applied to the quaternionẎ = ...
... Many quaternion activation functions have been investigated, whereas the split activation [19,24], a more frequent and simpler solution, is applied in our proposed model. Let be a split activation function applied to the quaternionẎ = ...
Preprint
The use of quaternions as a novel tool for color image representation has yielded impressive results in color image processing. By considering the color image as a unified entity rather than separate color space components, quaternions can effectively exploit the strong correlation among the RGB channels, leading to enhanced performance. Especially, color image inpainting tasks are highly beneficial from the application of quaternion matrix completion techniques, in recent years. However, existing quaternion matrix completion methods suffer from two major drawbacks. First, it can be difficult to choose a regularizer that captures the common characteristics of natural images, and sometimes the regularizer that is chosen based on empirical evidence may not be the optimal or efficient option. Second, the optimization process of quaternion matrix completion models is quite challenging because of the non-commutativity of quaternion multiplication. To address the two drawbacks of the existing quaternion matrix completion approaches mentioned above, this paper tends to use an untrained quaternion convolutional neural network (QCNN) to directly generate the completed quaternion matrix. This approach replaces the explicit regularization term in the quaternion matrix completion model with an implicit prior that is learned by the QCNN. Extensive quantitative and qualitative evaluations demonstrate the superiority of the proposed method for color image inpainting compared with some existing quaternion-based and tensor-based methods.
... Another application of QVNNs in image processing is actually an advanced and hybrid model [26], which used a mixture of quaternion and convolutional neural network with encoder-decoder. Parcollet et al. implemented QVNN with deep learning to find out hidden meanings of a given text (spoken by someone) [27]. Various forecasting applications are being developed using a new approach of Advanced QVNN as expansion of QVNN to cognitive or meta-cognitive QVNN [28]. ...
Chapter
Full-text available
The evolution of artificial neural networks has always been inspired by enormous power of human brain. This survey can be an eye-opener for researchers as its diverse applications of HDNNs in present scenario shows an intelligent way to mimic human brain without creating a complex neuronal architecture having large number of layers. HDNN’s urgency is evident because in science many quantities are measured not by single values for each of them but a group of values defines 1 single unit as: signal has two values: amplitude and phase. The deficiency in present literature on the questions allied with HDNN application looks like sluggish down research focal point and growth in the area. Hence, there exists a call for state-of-the-art addressing high-dimensional problems in neural networks. The study equips readers with a lucid acquaintance of the existing and novel inclination in HDNN replicas. A lot of applications of HDNNs in various disciplines like: healthcare, climate, security, speech recognition, computer vision, music signal processing, production, stock, science, etc. are covered here to confirm advancement in HDNNs. Study divulges that HDNNs is prevalently known as: CVNN, QVNN, 3D VVNN, and OVNN. This paper also reveals that HDNNs have outperformed real valued neural networks in terms of resource utilization, training data set requirement, and accuracy of results. To see a comparative picture of significance and possible implementation of different HDNNs few charts are provided. A motivational message and suggestions for future researches in this area of High-Dimensionality will conclude this paper.
... Since quaternions have four different dimensions, using the Hamilton's multiplication property, they can easily display compact data with high dimensions and discover the relationships between features with fewer parameters (less memory consumption). Recently, limited work has been done on the use of quaternion algebra for speech recognition [12][13][14], and according to the author's knowledge, this is the first time that this type of complex numbers has been used in the field of speech emotion recognition. Although quaternion algebra has already been used in echo state network design [15], the model presented is based on the use of a nonlinear quaternion activation function with local analytical properties in which the nonlinear gradient descent algorithm is used to find the weights of the output layer. ...
Article
Full-text available
Echo state network (ESN) is a powerful and efficient tool for displaying dynamic data. However, many existing ESNs have limitations for properly modeling high-dimensional data. The most important limitation of these networks is the high memory consumption due to their reservoir structure, which has prevented the increase of reservoir units and the maximum use of special capabilities of this type of networks. One way to solve this problem is to use quaternion algebra. Because quaternions have four different dimensions, high-dimensional data are easily represented and, using Hamilton's multiplication, with fewer parameters than real numbers, make external relations between the multidimensional features easier. In addition to the memory problem in the ESN network, the linear output of the ESN network poses an indescribable limit to its processing capacity, as it cannot effectively utilize higher-order statistics of features provided by the nonlinear dynamics of reservoir neurons. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis (2dPCA) technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using genetic algorithm (GA). In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition (SER) have been performed on EMODB, SAVEE and IEMOCAP speech emotional datasets. Comparisons show that the proposed QNESN network performs better than the ESN and most currently SER systems.
... Since quaternions have four different dimensions, using the Hamilton's multiplication property, they can easily display compact data with high dimensions and discover the relationships between features with fewer parameters (less memory consumption). Recently, limited work has been done on the use of quaternion algebra for speech recognition [12][13][14], and according to the author's knowledge, this is the first time that this type of complex numbers has been used in the field of speech emotion recognition. Although quaternion algebra has already been used in echo state network design [15], the model presented is based on the use of a nonlinear quaternion activation function with local analytical properties in which the nonlinear gradient descent algorithm is used to find the weights of the output layer. ...
Preprint
Full-text available
The echo state network (ESN) is a powerful and efficient tool for displaying dynamic data. However, many existing ESNs have limitations for properly modeling high-dimensional data. The most important limitation of these networks is the high memory consumption due to their reservoir structure, which has prevented the increase of reservoir units and the maximum use of special capabilities of this type of network. One way to solve this problem is to use quaternion algebra. Because quaternions have four different dimensions, high-dimensional data are easily represented and, using Hamilton multiplication, with fewer parameters than real numbers, make external relations between the multidimensional features easier. In addition to the memory problem in the ESN network, the linear output of the ESN network poses an indescribable limit to its processing capacity, as it cannot effectively utilize higher-order statistics of features provided by the nonlinear dynamics of reservoir neurons. In this research, a new structure based on ESN is presented, in which quaternion algebra is used to compress the network data with the simple split function, and the output linear combiner is replaced by a multidimensional bilinear filter. This filter will be used for nonlinear calculations of the output layer of the ESN. In addition, the two-dimensional principal component analysis technique is used to reduce the number of data transferred to the bilinear filter. In this study, the coefficients and the weights of the quaternion nonlinear ESN (QNESN) are optimized using the genetic algorithm. In order to prove the effectiveness of the proposed model compared to the previous methods, experiments for speech emotion recognition have been performed on EMODB, SAVEE, and IEMOCAP speech emotional datasets. Comparisons show that the proposed QNESN network performs better than the ESN and most currently SER systems.
Article
The reduced biquaternion-valued neural network (RQV-CNN) has recently seen tremendous success for color image processing. However, existing RQV-CNNs are relatively simple in structure and lack effective components, limiting the potential to enhance their performance. Furthermore, an effective method is needed to improve the unsatisfactory denoising results of RQV-CNNs in hard scenes. Therefore, we conduct an in-depth study in the RQ domain and propose a new denoising method, namely RQUNet. To our best knowledge, our approach is the first attempt to construct RQ deconvolutional layer. Building upon this, we construct a deeper RQ network. Additionally, we propose a parallel reduced biquaternion dual attention module to enhance the denoising performance of RQ network in hard scenes. RQUNet is entirely composed of reduced biquaternion-valued blocks, which achieve a significant reduction in the number of parameters. Extensive color image denoising experiments on three different denoising datasets demonstrate that our model achieved the highest average PSNR of 30.66 dB, which surpassed the previous state-of-the-art method by 0.28 dB with less computational cost, and obtained better visualization results.
Article
The neurocomputing communities have focused much interest on quaternionic-valued neural networks (QVNNs) due to the natural extension in quaternionic signals, learning of inter and spatial relationships between the features, and remarkable improvement against real-valued neural networks (RVNNs) and complex-valued neural networks (CVNNs). The excellent learning capability of QVNN inspired the researchers working on various applications in image processing, signal processing, computer vision, and robotic control system. Apart from its applications, many researchers have proposed new structures of quaternionic neurons and extended the architecture of QVNN for specific applications containing high-dimensional information. These networks have revealed their performance with a lesser number of parameters over conventional RVNNs. This paper focuses on past and recent studies of simple and deep QVNNs architectures and their applications. This paper provides the future directions to prospective researchers to establish new architectures and to extend the existing architecture of high-dimensional neural networks with the help of quaternion, octonion, or sedenion for appropriate applications.
Conference Paper
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates $backslash$emphdeep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
Article
A quaternionic extension of feed forward neural network, for processing multi-dimensional signals, is proposed in this paper. This neural network is based on the three layered network with random weights, called Extreme Learning Machines (ELMs), in which iterative least-mean-square algorithms are not required for training networks. All parameters and variables in the proposed network are encoded by quaternions and operations among them follow the quaternion algebra. Neurons in the proposed network are expected to operate multi-dimensional signals as single entities, rather than real-valued neurons deal with each element of signals independently. The performances for the proposed network are evaluated through two types of experiments: classifications and reconstructions for color images in the CIFAR-10 dataset. The experimental results show that the proposed networks are superior in terms of classification accuracies for input images than the conventional (real-valued) networks with similar degrees of freedom. The detailed investigations for operations in the proposed networks are conducted.
Article
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.