Conference Paper

Quaternion Neural Networks for Spoken Language Understanding

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For example, many successful applications using complex neural networks have been discussed, e.g. filtering, time-series signal processing, telecommunications, image processing and video processing [Nitta (2009), Hirose (2013], while quaternion neural networks have been also demonstrated in several applications [Parcollet et al. (2019)], which include signal processing [Arena et al. (1996), Buchholz and Bihan (2006)], attitude control [Fortuna et al. (2001)], image processing [Kusamichi et al. (2004)], filtering [Ujang et al. (2011)], pattern classification [Greenblatt and Agaian (2013), Shang and Hirose (2014)] and document processing [Parcollet et al. (2016)]. However, the application of hypercomplex-valued neural networks to control practical systems has not been adequately investigated. ...
... For example, many successful applications using complex neural networks have been discussed, e.g. filtering, time-series signal processing, telecommunications, image processing and video processing [Nitta (2009), Hirose (2013)], while quaternion neural networks have been also demonstrated in several applications [Parcollet et al. (2019)], which include signal processing [Arena et al. (1996), Buchholz and Bihan (2006)], attitude control [Fortuna et al. (2001)], image processing [Kusamichi et al. (2004)], filtering [Ujang et al. (2011)], pattern classification [Greenblatt and Agaian (2013), Shang and Hirose (2014)] and document processing [Parcollet et al. (2016)]. However, the application of hypercomplex-valued neural networks to control practical systems has not been adequately investigated. ...
... ], attitude control[Fortuna et al. (2001)], image processing[Kusamichi et al. (2004)], filtering[Ujang et al. (2011)], pattern classification[Greenblatt and Agaian (2013),Shang and Hirose (2014)] and document processing[Parcollet et al. (2016)]. However, the application of hypercomplex-valued neural networks to control practical systems has not been adequately investigated. ...
Article
This study investigates an adaptive controller by applying a neural network, in which all the network parameters, states, signals and functions are expressed using hypercomplex numbers and algebras; its application to dynamics control of a robot manipulator. To design hypercomplex–valued neural networks where each neural network is a multilayer feedforward network with a split–type activation function of neurons using a tapped–delay–line input, we consider the following four types of hypercomplex numbers: complex, hyperbolic, bicomplex and quaternion numbers. In the control system, we utilise a feedback error–learning scheme to conduct the training of the network through a back–propagation algorithm. In the computational experiments, we explore a hypercomplex–valued neural network–based controller as a trajectory control problem of a three–link robot manipulator, in which the position of the end–effector follows to the desired trajectory in a 3–dimensional space. The simulation results validate the feasibility and effectiveness of the quaternion neural network–based controller for this task.
... More recently, neural networks of complex and hyper complex numbers have received an increasing attention [27], [28], [29], [30], and some efforts have shown promising results in different applications. In particular, a deep quaternion network [20], [31], [32], a deep quaternion convolutional network [33], [34], and a quaternion recurrent neural network [35], [36] have been employed for challenging tasks such as images classification, compression and reconstruction, or speech recognition and natural language processing. ...
... As for image processing [37] the real component r is set to zero, and will change to any other value after the computation performed with the Hamilton product to obtain the first layer latent features. The quaternion features obtained with UAD feeding a quaternion classifier have shown superior theme identification accuracies for the DECODA corpus compared with results obtained with other types of features [20], [32]. ...
... Consequently, α = 50 T , with T the number of topics, and β = 0.01. The number T has been previously investigated for this task in [20], [32], and is set to 25. More precisely, 10 runs of the T = 25 LDA model are concatenated to obtain a final vector of size 25 × 10 = 250, to alleviate any variation. ...
Article
Machine learning (ML) and deep learning with deep neural networks (DNN), have drastically improved the performances of modern systems on numerous spoken language understanding (SLU) related tasks. Since most of current researches focus on new neural architectures to enhance the performances in realistic conditions, few recent works investigated the use of different algebras with neural networks (NN), to better represent the nature of the data being processed. To this extent, quaternion-valued neural networks (QNN) have shown better performances, and an important reduction of the number of neural parameters compared to traditional real-valued neural networks, when dealing with multidimensional signal. Nonetheless, the use of QNNs is strictly limited to quaternion input or output features. This paper introduces a new unsupervised method based on a hybrid autoencoder (AE) called real-to-quaternion autoencoder (R2H), to extract a quaternion-valued input signal from any real-valued data, to be processed by QNNs. The experiments performed to identify the most related theme of a given telephone conversation from a customer care service (CCS), demonstrate that the R2H approach outperforms all the previously established models, either real- or quaternion-valued ones, in term of accuracy and with up to four times fewer neural parameters.
... Finally, the output layer of a QMLP is based on a split activation function with respect to the target task. Consequently, the sigmoid or tanh functions are applied component-wise to the output of the QMLP for a quaternion approximation task (Arena et al. 1997), and a split softmax is used to obtain a posterior distribution for classification purposes (Parcollet et al. 2016). Indeed, target classes are often defined as real-valued binary vectors, and it is therefore convenient to decorrelate the components of the quaternion output with a split softmax. ...
... The trick to benefits from QNN is therefore to find an appropriate task where the input is easily expressed with a vector of three or four dimensions. Parcollet et al. (2016) first proposed an adapted document segmentation for a theme classification task of telephone conversations from a customer care service (CCS) of a public transportation system. Indeed, CCS conversations can be split in three elements: The first and second parts are made of the customer and agent speech turns respectively, while the third part represents the whole conversation. ...
... Although there is a consensus about better performances with deeper architectures, all the previous explored QMLP were one-hidden-layered neural networks, and do not benefit from the high abstraction and generalization capabilities of deep neural networks . Therefore, Parcollet et al. (2017a) have combined previous researches (Parcollet et al. 2016(Parcollet et al. , 2017b to propose the first pre-trained quaternion deep neural network (QDNN) made to up to 5 hidden layers. As for real-valued NNs , QDNNs are pre-trained based on quaternion-valued autoencoders to allow a faster convergence. ...
Article
Full-text available
Quaternion neural networks have recently received an increasing interest due to noticeable improvements over real-valued neural networks on real world tasks such as image, speech and signal processing. The extension of quaternion numbers to neural architectures reached state-of-the-art performances with a reduction of the number of neural parameters. This survey provides a review of past and recent research on quaternion neural networks and their applications in different domains. The paper details methods, algorithms and applications for each quaternion-valued neural networks proposed.
... The authors of [14] achieve rotational invariance for point-cloud inputs by utilizing quaternion valued inputs and features, the used weights and biases however are real valued matrices. In the context of speech/language understanding, [15] employs a quaternion MLP approach, where the quaternion model could outperform real valued counterparts while requiring fewer epochs to be trained. Similarly, [16] utilizes quaternion convolutional neural networks (CNNs) and quaternion recurrent neural networks (RNNs) for this task. ...
... 2) Activations: We employ element-wise working activation functions as proposed in [6], [15], [16]. Assuming an already known and tested activation ψ(·) like ReLU or Tanh, the application on a quaternion input is ...
... Whist there is a big success and broad range of applications of real-valued neural networks, in the recent years along complex valued models [1, 2, 3, 4, 5, 6] also quaternion valued models [7,8,9,10,11,12] strike interest of researchers and gain more and more popularity. Applications are e.g. ...
... [8] achieve rotational invariance for point-cloud inputs by utilizing quaternion valued inputs and features, the used weights and biases however are real valued matrices. In the context of Language/Speech understanding [9] employs a quaternion MLP approach whereby [10] utilizes quaternion CNN and quaternion RNN to outperform real valued counterparts. [11] proposes novel quaternion weight initialization strategies as well as quaternion batch-normalization to perform image classification and segmentation with quaternion valued models. ...
Preprint
Full-text available
Quaternion valued neural networks experienced rising popularity and interest from researchers in the last years, whereby the derivatives with respect to quaternions needed for optimization are calculated as the sum of the partial derivatives with respect to the real and imaginary parts. However, we can show that product- and chain-rule does not hold with this approach. We solve this by employing the GHRCalculus and derive quaternion backpropagation based on this. Furthermore, we experimentally prove the functionality of the derived quaternion backpropagation.
... Apart from these, quaternion-valued neural networks (QVNN) store and learn the spatial relationships in the various transformations of 3D coordinates [2,3] and in between the color pixels [26], whereas real/complexvalued neural network fails. These qualities have motivated the researchers to apply QVNN in many fields such as automatic speech recognition [27,28], image classification [29], PolSAR land classification [30], prostate cancer Gleason grading [31], color image compression [32], facial expression recognition [33], robot manipulator [34], spoken language understanding [35], attitude control of spacecraft [36], and banknote classification [37]. However, 3D or 4D information has been processed using a real-valued neural network (RVNN) where all components are considered separately in which it is neglected the correlation among each other, addressed in [38]. ...
... In the literature, it has been investigated that the neural network in the quaternionic domain learns the amplitude as well as phase information of quaternionic signals effectively [39]. In [35], the realvalued multi-layer perceptron (MLP) and quaternion-valued multi-layer perceptron (QMLP) are also compared for the identification of spoken dialogues and the result has been reported that QMLP takes lesser number of epochs and better accuracy as compared to MLP. From the perspective of a biological neuron, its action potential may have heterogeneous pulse configurations and diverse separation among pulses. ...
Article
Full-text available
The learning algorithm for a three-layered neural structure with novel non-linear quaternionic-valued multiplicative (QVM) neurons is proposed in this paper. The computing capability of non-linear aggregation in the cell body of biological neurons inspired the development of a non-linear neuron model. However, unlike linear neuron models, most non-linear neuron models are built on higher order aggregation, which is more mathematically complex and difficult to train. As a result, building non-linear neuron models with a simple structure is a difficult and time-consuming endeavor in the neurocomputing field. The concept of a QVM neuron model was influenced by the non-linear neuron model, which has a simple structure and the great computational ability. The suggested neuron’s linearity is determined by the weight and bias associated with each quaternionic-valued input. Non-commutative multiplication of all linearly connected quaternionic input-weight terms accommodates the non-linearity. To train three-layered networks with QVM neurons, the standard quaternionic-gradient-based backpropagation (QBP) algorithm is utilized. The computational and generalization capabilities of the QVM neuron are assessed through training and testing in the quaternionic domain utilizing benchmark problems, such as 3D and 4D chaotic time-series predictions, 3D geometrical transformations, and 3D face recognition. The training and testing outcomes are compared to conventional and root-power mean (RPM) neurons in quaternionic domain using training–testing MSEs, network topology (parameters), variance, and AIC as statistical measures. According to these findings, networks with QVM neurons have greater computational and generalization capabilities than networks with conventional and RPM neurons in quaternionic domain.
... We remark that single quaternion neural networks as feedforward [15], including convolutional single quaternion NN [16,17] as well as recurrent single quaternion NN [18] are well developed. However, the scopes of these papers are completely different in that these architectures are focused on an application in speech recognition or image classification/segemntation. ...
... (38) To avoid singularities for φ = 0 the already known Taylor series from (15) and the following Taylor series can be used: ...
Preprint
Full-text available
We propose a novel neural network architecture based on dual quaternions which allow for a compact representation of informations with a main focus on describing rigid body movements. To cover the dynamic behavior inherent to rigid body movements, we propose recurrent architectures in the neural network. To further model the interactions between individual rigid bodies as well as external inputs efficiently, we incorporate a novel attention mechanism employing dual quaternion algebra. The introduced architecture is trainable by means of gradient based algorithms. We apply our approach to a parcel prediction problem where a rigid body with an initial position, orientation, velocity and angular velocity moves through a fixed simulation environment which exhibits rich interactions between the parcel and the boundaries.
... There are some more recent examples of building models that use quaternions represented as real-values. In [15] they used a quaternion multi-layer perceptron (QMLP) for document understanding and [16] uses a similar approach in processing multi-dimensional signals. ...
... 15) ...
Conference Paper
Full-text available
... Quaternion networks have been applied to computer vision [63] and human motion classification [64], [65], where rotations to the 3D image or 3D space coordination were common and essential operations. It gained popularity in recommender systems [66], [67], and the natural language processing domain [28], [68], [69]. Parcollet et al. [70] especially proposed quaternion recurrent neural networks for speech recognition. ...
Preprint
Full-text available
With the widespread online social networks, hate speeches are spreading faster and causing more damage than ever before. Existing hate speech detection methods have limitations in several aspects, such as handling data insufficiency, estimating model uncertainty, improving robustness against malicious attacks, and handling unintended bias (i.e., fairness). There is an urgent need for accurate, robust, and fair hate speech classification in online social networks. To bridge the gap, we design a data-augmented, fairness addressed, and uncertainty estimated novel framework. As parts of the framework, we propose Bidirectional Quaternion-Quasi-LSTM layers to balance effectiveness and efficiency. To build a generalized model, we combine five datasets collected from three platforms. Experiment results show that our model outperforms eight state-of-the-art methods under both no attack scenario and various attack scenarios, indicating the effectiveness and robustness of our model. We share our code along with combined dataset for better future research
... Since the introduction of the QMLP and its associated training algorithm, researchers have used QMLPs for a variety of tasks. In particular, QMLPs have been used as autoencoders [23], for color image processing [24], text processing [25], and polarized signal processing [26]. Another natural application of quaternions is in robotic control [27], since quaternions can compactly represent 3-dimensional rotation and motion through space. ...
Article
Full-text available
In recent years, real-valued neural networks have demonstrated promising, and often striking, results across a broad range of domains. This has driven a surge of applications utilizing high-dimensional datasets. While many techniques exist to alleviate issues of high-dimensionality, they all induce a cost in terms of network size or computational runtime. This work examines the use of quaternions, a form of hypercomplex numbers, in neural networks. The constructed networks demonstrate the ability of quaternions to encode high-dimensional data in an efficient neural network structure, showing that hypercomplex neural networks reduce the number of total trainable parameters compared to their real-valued equivalents. Finally, this work introduces a novel training algorithm using a meta-heuristic approach that bypasses the need for analytic quaternion loss or activation functions. This algorithm allows for a broader range of activation functions over current quaternion networks and presents a proof-of-concept for future work.
... Finally, several approaches to single quaternion neural networks, specifically feedforward NN [43], including convolutional single quaternion NN [44], [45] as well as recurrent single quaternion NN [46] exist. However, the scopes of these papers are completely different in that these architectures are focused on an application in speech recognition or image classification/segmentation. ...
Article
Full-text available
We propose a novel neural network architecture based on dual quaternions which allow for a compact representation of information with a main focus on describing rigid body movements. After introducing the underlying dual quaternion math, we derive dual quaternion valued neural network layer which are generally applickable to all sorts of problems which can benefit from a mathematical description in dual quaternion space. To cover the dynamic behavior inherent to rigid body movements, we propose recurrent architectures in the neural network. To further model the rigid bodies interactions efficiently, we incorporate a novel attention mechanism employing dual quaternion algebra. The introduced architecture is trainable by means of gradient based algorithms. We apply our approach to a parcel prediction problem where a rigid body with an initial position, orientation, velocity and angular velocity moves through a fixed simulation environment which exhibits rich interactions between the parcel and the boundaries. There we can show that the dual quaternion valued models outperform their counterparts operation on the real numbers, confirming the successful introduction of an inductive bias through the usage of dual quaternion math. Furthermore, we used an advantageous custom data augmentation technique specifically tailored for the usage with our dual quaternion valued input data.
... Recently, an emerging research interest in complex [46], [47] and hypercomplex (i.e., quaternion) representations [30]- [32] has arisen. For example, Parcollet et al. successfully make the first attempt to devise a quaternion multi-layer perceptron for language understanding [48]. For image classification, two highly effective quaternion convolutional neural networks are proposed in [30] and [49], along with fundamental tools like quaternion-valued convolution operation, batch normalization, and initialization. ...
Preprint
Full-text available
As a well-established approach, factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based models with DNNs. However, though better results are obtained with DNN-based FM variants, such performance gain is paid off by an enormous amount (usually millions) of excessive model parameters on top of the plain FM. Consequently, the heavy parameterization impedes the real-life practicality of those deep models, especially efficient deployment on resource-constrained IoT and edge devices. In this paper, we move beyond the traditional real space where most deep FM-based models are defined, and seek solutions from quaternion representations within the hypercomplex space. Specifically, we propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM), which are two novel lightweight and memory-efficient quaternion-valued models for sparse predictive analytics. By introducing a brand new take on FM-based models with the notion of quaternion algebra, our models not only enable expressive inter-component feature interactions, but also significantly reduce the parameter size due to lower degrees of freedom in the hypercomplex Hamilton product compared with real-valued matrix multiplication. Extensive experimental results on three large-scale datasets demonstrate that QFM achieves 4.36% performance improvement over the plain FM without introducing any extra parameters, while QNFM outperforms all baselines with up to two magnitudes' parameter size reduction in comparison to state-of-the-art peer methods.
... The output of neuron n in layer l, denoted as γ l n , is defined as [1,27]: ...
Chapter
Full-text available
Machine Learning has recently emerged as a new paradigm for processing all types of information. In particular, Artificial Intelligence is attractive to corporations & research institutions as it provides innovative solutions for unsolved problems, & it enjoys a great popularity among the general public. However, despite the fact that Machine Learning offers huge opportunities for the IT industry, Artificial Intelligence technology is still at its infancy, with many issues to be addressed. In this paper, we present a survey of quaternion applications in Neural Networks, one of the most promising research lines in artificial vision which also has a great potential in several other topics. The aim of this paper is to provide a better understanding of the design challenges of Quaternion Neural Networks & identify important research directions in this increasingly important area.
... Due to its rich representational capability as well as high flexibility, recently, quaternions have attracted much interest and been adopted in many areas, including robotics [20], image classification [9], KG embedding [54], automatic speech recognition [31], and general recommender system [55]. Besides, quaternion MLP [29] achieves better performance on spoken language understanding than standard real-valued MLP, and quaternion RNN [30] also outperforms traditional RNNs on the phoneme recognition task with less computation burden. Shortly, quaternions could enable neural networks to code both latent interand intra-dependencies [54] between multidimensional input features, and show promising performance. ...
... This method simply defines LBP in the form of quaternion, and solves the problem of information redundancy between adjacent points in traditional LBP. Many researchers combine quaternion with neural networks and get good results [Parcollet, Morchid, Bousquet et al. (2016), Parcollet, Zhang, Morchide et al. (2018)]. It can be seen from the above that quaternion has obvious advantages in processing color digital images. ...
... Recently, the combination of quaternions and neural networks has received more and more attention because quaternion numbers allow neural network-based models to code latent inter-dependencies between groups of input features during the learning process with fewer parameters than C-NNs. In particular, the deep quaternion network [29], [30], the deep quaternion convolutional network [31], [32], or the deep quaternion recurrent neural network [33] has been employed for challenging tasks such as images and language processing. However, there is still a lack of investigations on the combination of the quaternions and the capsule networks, so this is a topic worth studying. ...
Article
Full-text available
Knowledge graphs are collections of factual triples. Link prediction aims to predict lost factual triples in knowledge graphs. In this paper, we present a novel capsule network method for link prediction taking advantages of quaternion. More specifically, we explore two methods, including a relational rotation model called QuaR and a deep capsule neural model called CapS-QuaR to encode semantics of factual triples. QuaR model defines each relation as a rotation from the head entity to the tail entity in the hyper-complex vector space, which could be used to infer and model diverse relation patterns, including: symmetry/anti-symmetry, reversal and combination. Based on these characteristics of quaternions, we use the embeddings of entities and relations trained from QuaR as the input to CapS-QuaR model. Experimental results on multiple benchmark knowledge graphs show that the proposed method is not only scalable, but also able to predict the correctness of triples in knowledge graphs and significantly outperform the existing state-of-the-art models for link prediction. Finally, the evaluation of a real dataset for search personalization task is conducted to prove the effectiveness of our model.
... Quaternion representations are also useful for enhancing the performance of convolutional neural networks on multiple tasks such as automatic speech recognition [Parcollet et al.] and image classification [Gaudet andMaida, 2018, Parcollet et al., 2018a]. Quaternion multiplayer perceptron [Parcollet et al., 2016] and quaternion autoencoders [Parcollet et al., 2017] also outperform standard MLP and autoencoder. In a nutshell, the major motivation behind these models is that quaternions enable the neural networks to code latent inter-and intra-dependencies between multidimensional input features, thus, leading to more compact interactions and better representation capability. ...
Preprint
Full-text available
In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modelled as rotations in the quaternion space. The advantages of the proposed approach are: (1) Latent inter-dependencies (between all components) are aptly captured with Hamilton product, encouraging a more compact interaction between entities and relations; (2) Quaternions enable expressive rotation in four-dimensional space and have more degree of freedom than rotation in complex plane; (3) The proposed framework is a generalization of ComplEx on hypercomplex space while offering better geometrical interpretations, concurrently satisfying the key desiderata of relational representation learning (i.e., modeling symmetry, anti-symmetry and inversion). Experimental results demonstrate that our method achieves state-of-the-art performance on four well-established knowledge graph completion benchmarks.
... Similarly, QCNN cannot be without an activation. Many activations have been proposed for quaternion [11], whereas the split activation is applied in the proposed model (the method is mentioned in [12], [13]) is defined as follows: ...
Article
Full-text available
The convolutional neural network is widely popular for solving the problems of color image feature extraction. However, in the general network, the interrelationship of the color image channels is neglected. Therefore, a novel quaternion convolutional neural network (QCNN) is proposed in this paper, which always treats color triples as a whole to avoid information loss. The original quaternion convolution operation is presented and constructed to fully mix the information of color channels. The quaternion batch normalization and pooling operations are derived and designed in quaternion domain to further ensure the integrity of color information. Meanwhile, the knowledge of the attention mechanism is incorporated to boost the performance of the proposed QCNN. Experiments demonstrate that the proposed model is more efficient than the traditional convolutional neural network and another QCNN with the same structure, and has better performance in color image classification and color image forensics.
... The weighted root-power mean as an aggregation function of the proposed neuron model with quaternionic-valued signals exhibits the natural and general model that presents the various existing neuron models as its special cases, depending on the domain of input signals and value of power coefficient. However, the quaternionicvalued networks with conventional neurons are used in PolSAR Land classification [55] and spoken language understanding [53]. ...
Article
This paper illustrates the new structure of artificial neuron based on root-power means (RPM) for quaternionic-valued signals and also presented an efficient learning process of neural networks with quaternionic-valued root-power means neurons (H-RPMN). The main aim of this neuron is to present the potential capability of a nonlinear aggregation operation on the quaternionic-valued signals in neuron cell. A wide spectrum of aggregation ability of RPM in between minima and maxima has a beautiful property of changing its degree of compensation in the natural way which emulates the various existing neuron models as its special cases. Further, the quaternionic resilient propagation algorithm (H-RPROP) with error-dependent weight backtracking step significantly accelerates the training speed and exhibits better approximation accuracy. The wide spectrums of benchmark problems are considered to evaluate the performance of proposed quaternionic root-power mean neuron with H-RPROP learning algorithm.
... There are some more recent examples of building models that use quaternions represented as real-values. In [15] they used a quaternion multi-layer perceptron (QMLP) for document understanding and [16] uses a similar approach in processing multi-dimensional signals. ...
Article
Full-text available
The field of deep learning has seen significant advancement in recent years. However, much of the existing work has been focused on real-valued numbers. Recent work has shown that a deep learning system using the complex numbers can be deeper for a fixed parameter budget compared to its real-valued counterpart. In this work, we explore the benefits of generalizing one step further into the hyper-complex numbers, quaternions specifically, and provide the architecture components needed to build deep quaternion networks. We go over quaternion convolutions, present a quaternion weight initialization scheme, and present algorithms for quaternion batch-normalization. These pieces are tested in a classification model by end-to-end training on the CIFAR-10 and CIFAR-100 data sets and a segmentation model by end-to-end training on the KITTI Road Segmentation data set. The quaternion networks show improved convergence compared to real-valued and complex-valued networks, especially on the segmentation task.
... The processing in quaternionic-valued neural network (QVNN) is as simple as conventional neural networks and error back-propagation algorithm in quaternionic domain (ℍ-BP) [11] has been obtained using the concept of gradient-decent optimization. The superiority and capability of QVNN are verified by the recent publications in the area of inverse kinematics of robot manipulator [12], nonlinear adaptive filtering [13], spoken language understanding [14], PolSAR land classification [15]. In QVNN, the split type activation functions is used instead of analytic because of its merit investigated in past for complex-valued neural networks [3,7,8]. ...
Article
Full-text available
In the last few years there have been a growing number of studies concerning the introduction of quaternions into neural networks, which demand a faster learning technique with superior performance. In this paper, we propose a fast, but novel quaternionic resilient propagation (H-RPROP) algorithm for high dimensional problems. It achieves significantly faster learning over quaternionic domain back propagation (H-BP) algorithm. The slow convergence and stability of weight update around the local minima are the main drawbacks of H-BP. The gradient descent based H-BP algorithm takes the value of partial derivative (error gradient) and scales the weight updates through a learning rate while H-RPROP does not takes the value of partial derivatives, but it considers only the sign of partial derivatives that indicates the direction for each component of quaternionic weight update. The main aim of H-RPROP is to eliminate the value which is a little increased by constant increasing factor in order to accelerate convergence in shallow regions. H-RPROP computes an individual delta for each connection of the network, which determines the size of weight update. Therefore, the faster convergence and higher accuracy are the main key features of proposed algorithm. The intelligent behavior of the proposed learning approach is demonstrated through a wide spectrum of prediction problems with different statistical performance evaluation metrics. In order to illustrate the learning and generalization of 3D motion as its inherent behavior, a solid set of experiments is presented where the training is performed through input-output mapping over a line and the generalization ability is verified over various non-linear geometrical objects. The slow convergence problem of back-propagation algorithm has been well combated by H-RPROP. It has always demonstrated drastic reduction in the training cycles.
Article
The neurocomputing communities have focused much interest on quaternionic-valued neural networks (QVNNs) due to the natural extension in quaternionic signals, learning of inter and spatial relationships between the features, and remarkable improvement against real-valued neural networks (RVNNs) and complex-valued neural networks (CVNNs). The excellent learning capability of QVNN inspired the researchers working on various applications in image processing, signal processing, computer vision, and robotic control system. Apart from its applications, many researchers have proposed new structures of quaternionic neurons and extended the architecture of QVNN for specific applications containing high-dimensional information. These networks have revealed their performance with a lesser number of parameters over conventional RVNNs. This paper focuses on past and recent studies of simple and deep QVNNs architectures and their applications. This paper provides the future directions to prospective researchers to establish new architectures and to extend the existing architecture of high-dimensional neural networks with the help of quaternion, octonion, or sedenion for appropriate applications.
Article
Due to the sparsity of available features in web-scale predictive analytics, combinatorial features become a crucial means for deriving accurate predictions. As a well-established approach, a factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based models with DNNs. However, though better results are obtained with DNN-based FM variants, such performance gain is paid off by an enormous amount (usually millions) of excessive model parameters on top of the plain FM. Consequently, the heavy parameterization impedes the real-life practicality of those deep models, especially efficient deployment on resource-constrained Internet of Things (IoT) and edge devices. In this article, we move beyond the traditional real space where most deep FM-based models are defined and seek solutions from quaternion representations within the hypercomplex space. Specifically, we propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM), which are two novel lightweight and memory-efficient quaternion-valued models for sparse predictive analytics. By introducing a brand new take on FM-based models with the notion of quaternion algebra, our models not only enable expressive inter-component feature interactions but also significantly reduce the parameter size due to lower degrees of freedom in the hypercomplex Hamilton product compared with real-valued matrix multiplication. Extensive experimental results on three large-scale datasets demonstrate that QFM achieves 4.36% performance improvement over the plain FM without introducing any extra parameters, while QNFM outperforms all baselines with up to two magnitudes' parameter size reduction in comparison to state-of-the-art peer methods.
Chapter
Signal representation and processing are the backbone of mathematically-aided engineering. Among the myriad of ideas and results in that realm, many sorts of algorithms and techniques capable of learning from experience have taken the stage in the last decades, with a crescendo of great successes in a variety of fronts in recent years. This paper provides a sketchy outline of those developments that seem more relevant or promising in view of their bearing on the geometric calculus (multivector) representations of signals and the concomitant automatic learning algorithms. The corresponding artificial neurons, and their organization in networks, may be seen as a way to transcend the biologically inspired neuron networks much as the wheel or aviation transcended legs or bird flight. Recent developments suggest that there are exciting research opportunities ahead.
Preprint
Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.
Article
Full-text available
The choice of transfer functions may strongly influence complexity and performance of neural networks. Although sigmoidal transfer functions are the most common there is no apriorireason why models based on such functions should always provide optimal decision borders. A large number of alternative transfer functions has been described in the literature. A taxonomy of activation and output functions is proposed, and advantages of various non-local and local neural transfer functions are discussed. Several less-known types of transfer functions and new combinations of activation/output functions are described. Universal transfer functions, parametrized to change from localized to delocalized type, are of greatest interest. Other types of neural transfer functions discussed here include functions with activations based on nonEuclidean distance measures, bicentral functions, formed from products or linear combinations of pairs of sigmoids, and extensions of such functions making rotations...
Article
Full-text available
The goal of the DECODA project is to reduce the development cost of Speech Analytics systems by reducing the need for manual annotation. This project aims to propose robust speech data mining tools in the framework of call-center monitoring and evaluation, by means of weakly supervised methods. The applicative framework of the project is the call-center of the RATP (Paris public transport authority). This project tackles two very important open issues in the development of speech mining methods from spontaneous speech recorded in call-centers : robustness (how to extract relevant information from very noisy and spontaneous speech messages) and weak supervision (how to reduce the annotation effort needed to train and adapt recognition and classification models). This paper describes the DECODA corpus collected at the RATP during the project. We present the different annotation levels performed on the corpus, the methods used to obtain them, as well as some evaluation of the quality of the annotations produced.
Conference Paper
Full-text available
The paper introduces new features for describing possible focus variation in a human/human conversation. The application considered is a real-life telephone customer care service. The purpose is to hypothesize the dominant theme of conversations between a casual customer calling. Conversations are processed by an automatic speech recognition system that provides hypotheses used for extracting word frequency. Features are extracted in different, broadly defined and partially overlapped, time segments. Combinations of each feature in different segments are represented in a quaternion algebra framework. The advantage of the proposed approach is made evident by the statistically significant improvements in theme classification accuracy.
Conference Paper
Full-text available
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Conference Paper
Full-text available
The LIA developed a speech recognition toolkit providing most of the components required by speech-to-text systems. This tool- box allowed to build a Broadcast News (BN) transcription system was involved in the ESTER evaluation campaign ((3)), on unconstrained tran- scription and real-time transcription tasks. In this paper, we describe the techniques we used to reach the real-time, starting from our baseline 10xRT system. We focus on some aspects of the A* search algorithm which are critical for both eciency and accuracy. Then, we evaluate the impact of the dierent system components (lexicon, language mod- els and acoustic models) to the trade-o between eciency and accuracy. Experiments are carried out in framework of the ESTER evaluation cam- paign. Our results show that the real time system reaches performance on about 5.6% absolute WER whorses than the standard 10xRT system, with an absolute WER (Word Error Rate) of about 26.8%.
Article
Full-text available
A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.
Article
Full-text available
Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both local and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques can be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance
Article
Quaternions are a class of hypercomplex number systems, a four-dimensional extension of imaginary numbers, which are extensively used in various fields such as modern physics and computer graphics. Although the number of applications of neural networks employing quaternions is comparatively less than that of complex-valued neural networks, it has been increasing recently. In this chapter, the authors describe two types of quaternionic neural network models. One type is a multilayer perceptron based on 3D geometrical affine transformations by quaternions. The operations that can be performed in this network are translation, dilatation, and spatial rotation in three-dimensional space. Several examples are provided in order to demonstrate the utility of this network. The other type is a Hopfield-type recurrent network whose parameters are directly encoded into quaternions. The stability of this network is demonstrated by proving that the energy decreases monotonically with respect to the change in neuron states. The fundamental properties of this network are presented through the network with three neurons.
Article
We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
Presents parameter estimation methods common with discrete proba-bility distributions, which is of particular interest in text modeling. Starting with maximum likelihood, a posteriori and Bayesian estimation, central concepts like conjugate distributions and Bayesian networks are reviewed. As an application, the model of latent Dirichlet allocation (LDA) is explained in detail with a full derivation of an approximate inference algorithm based on Gibbs sampling, in-cluding a discussion of Dirichlet hyperparameter estimation. Finally, analysis methods of LDA models are discussed.
Article
In this paper a new type of multilayer feedforward neural network is introduced. Such a structure, called hypercomplex multilayer perceptron (HMLP), is developed in quaternion algebra and allows quaternionic input and output signals to be dealt with, requiring a lower number of neurons than the real MLP, thus providing a reduced computational complexity. The structure introduced represents a generalization of the multilayer perceptron in the complex space (CMLP) reported in the literature. The fundamental result reported in the paper is a new density theorem which makes HMLPs universal interpolators of quaternion valued continuous functions. Moreover the proof of the density theorem can be restricted in order to formulate a density theorem in the complex space. Due to the identity between the quaternion and the four-dimensional real space, such a structure is also useful to approximate multidimensional real valued functions with a lower number of real parameters, decreasing the probability of being trapped in local minima during the learning phase. A numerical example is also reported in order to show the efficiency of the proposed structure. © 1997 Elsevier Science Ltd. All Rights Reserved.
Article
We give a brief survey on quaternions and matrices of quaternions, present new proofs for certain known results, and discuss the quaternionic analogues of complex matrices. The methods of converting a quaternion matrix to a pair of complex matrices and homotopy theory are emphasized.
Article
In this paper a new type of multilayer feedforward neural network is introduced. Such a structure, called hypercomplex multilayer perceptron (HMLP), is developed in quaternion algebra and allows quaternionic input and output signals to be dealt with, requiring a lower number of neurons than the real MLP, thus providing a reduced computational complexity. The structure introduced represents a generalization of the multilayer perceptron in the complex space (CMLP) reported in the literature. The fundamental result reported in the paper is a new density theorem which makes HMLPs universal interpolators of quaternion valued continuous functions. Moreover the proof of the density theorem can be restricted in order to formulate a density theorem in the complex space. Due to the identity between the quaternion and the four-dimensional real space, such a structure is also useful to approximate multidimensional real valued functions with a lower number of real parameters, decreasing the probability of being trapped in local minima during the learning phase. A numerical example is also reported in order to show the efficiency of the proposed structure. © 1997 Elsevier Science Ltd. All Rights Reserved.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
Three methods for the formulation of the kinematic equations of robots with rigid links are presented in this paper. The first and most common method in the robotics community is based on 4x4 homogeneous matrix transformation, the second one is based on Lie algebra, and the third one on screw theory expressed via dual quaternions algebra. These three methods are compared in this paper for their use in the kinematic analysis of robot arms. The basic theory and the transformation operators, upon which every method is based, are referenced. Three analytic algorithms are presented for the solution of the direct kinematic problem corresponding to each method, and the geometric significance of the transformation operators and parameters is explained. Finally, a comparative study on the computation and storage requirements for the three methods is worked out.
Article
The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents.
Article
In this paper, we propose keyword extraction method for dictation of radio news which con- sists of several' domains. In our method, newspaper articles which are automatically classified into suitable domains are used in order to calculate feature vectors. The feature vectors shows term-domain interdependence and are used for selecting a suitable domain of each part of radio news. Keywords are extracted by using the selected domain. The results of keyword extraction experiments showed that our methods are robust and effective for dictation of radio news.