Conference Paper

Sequence to Sequence Learning with Neural Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Historical Average (HA), VAR [23], and SVR [30] are traditional methods. FC-LSTM [32], DCRNN [20], Graph WaveNet [36], ASTGCN [11], and STSGCN [31] are typical deep learning methods. GMAN [41], MTGNN [35], and GTS [29] are recent state-of-the-art works. ...
... STEP also outperforms STEP w/o reg, showing that the long sequence representations of TSFormer is superior in improving the graph quality. In addition, as mentioned in Section 1, DCRNN represents a large class of STGNNs [21,26,29,41] that are based on the seq2seq [32] architecture. We fuse the representation of TSFormer to the latent representations of the seq2seq encoder according to Eq.(5). ...
... The accuracy of multivariate time series forecasting has been largely improved by artificial intelligence [37], especially deep learning techniques. Among these techniques, Spatial-Temporal Graph Neural Networks (STGNNs) are the most promising methods, which combine Graph Neural Networks (GNNs) [7,18] and sequential models [6,32] to model the spatial and temporal dependency jointly. Graph WaveNet [36], MTGNN [35], STGCN [38], and StemGNN [4] combine graph convolutional networks and gated temporal convolutional networks with their variants. ...
Preprint
Full-text available
Multivariate Time Series (MTS) forecasting plays a vital role in a wide range of applications. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have become increasingly popular MTS forecasting methods. STGNNs jointly model the spatial and temporal patterns of MTS through graph neural networks and sequential models, significantly improving the prediction accuracy. But limited by model complexity, most STGNNs only consider short-term historical MTS data, such as data over the past one hour. However, the patterns of time series and the dependencies between them (i.e., the temporal and spatial patterns) need to be analyzed based on long-term historical MTS data. To address this issue, we propose a novel framework, in which STGNN is Enhanced by a scalable time series Pre-training model (STEP). Specifically, we design a pre-training model to efficiently learn temporal patterns from very long-term history time series (e.g., the past two weeks) and generate segment-level representations. These representations provide contextual information for short-term time series input to STGNNs and facilitate modeling dependencies between time series. Experiments on three public real-world datasets demonstrate that our framework is capable of significantly enhancing downstream STGNNs, and our pre-training model aptly captures temporal patterns.
... Deep recurrent neural networks have seen tremendous success in the last decade across domains like NLP, speech and audio processing [1], computer vision [2], time series classification, forecasting and so on. In particular, it has achieved state-of-art performance (and beyond) in tasks like handwriting recognition [3], speech recognition [4], [5], machine translation [6], [7] and image captioning [8], [9], to name a few. A salient feature in all these applications that RNNs exploits during learning is the sequential aspect of the data. ...
... Encoder-Decoder (ED) OR Seq2Seq architectures used to map variable length sequences to another variable length sequence were first successfully applied for machine translation [6], [7] tasks. From then on, the ED framework has been successfully applied in many other tasks like speech recognition [14], image captioning etc. ...
... We first allow all co-efficients in eqn. (7) to be unconstrained. We further assume that the P groups of consecutive values (indicated in (c) above) need not be of the same size p. ...
Preprint
Full-text available
Deep learning (DL) in general and Recurrent neural networks (RNNs) in particular have seen high success levels in sequence based applications. This paper pertains to RNNs for time series modelling and forecasting. We propose a novel RNN architecture capturing (stochastic) seasonal correlations intelligently while capable of accurate multi-step forecasting. It is motivated from the well-known encoder-decoder (ED) architecture and multiplicative seasonal auto-regressive model. It incorporates multi-step (multi-target) learning even in the presence (or absence) of exogenous inputs. It can be employed on single or multiple sequence data. For the multiple sequence case, we also propose a novel greedy recursive procedure to build (one or more) predictive models across sequences when per-sequence data is less. We demonstrate via extensive experiments the utility of our proposed architecture both in single sequence and multiple sequence scenarios.
... In this paper, we explore the use of recurrent encoder-decoder neural networks (e.g., Sequence2Sequence [29]) for open story generation. A recurrent encoder-decoder neural network is trained to predict the next token(s) in a sequence, given one or more input tokens. ...
... The event2event network is a recurrent multi-layer encoderdecoder network based on [29]. Unless otherwise stated in experiments below, our event2event network is trained with input ...
... For each experiment, we trained a long short-term memory (LSTM) sequence-to-sequence recurrent neural net (RNN) [29] using Tensorflow [1]. Each RNN was trained with the same parameters (0.5 learning rate, 0.99 learning rate decay, 5.0 maximum gradient, 64 batch size, 1024 model layer size, and 4 layers), varying only the input/output, the bucket size, the number of epochs and the vocabulary. ...
Article
Full-text available
Automated story generation is the problem of automatically selecting a sequence of events, actions, or words that can be told as a story. We seek to develop a system that can generate stories by learning everything it needs to know from textual story corpora. To date, recurrent neural networks that learn language models at character, word, or sentence levels have had little success generating coherent stories. We explore the question of event representations that provide a mid-level of abstraction between words and sentences in order to retain the semantic information of the original data while minimizing event sparsity. We present a technique for preprocessing textual story data into event sequences. We then present a technique for automated story generation whereby we decompose the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). We give empirical results comparing different event representations and their effects on event successor generation and the translation of events to natural language.
... Without the need for recurrence or convolutions and fully relying on the attention mechanism, the Transformer model was originally designed for mapping internal relationships among the elements of paired sequences of word embeddings for neural machine translation. Transformer employs a self-attention module (depicted in Figure 1(a)) inside a multi-head attention module, creating a wide parallelized model that can be trained faster than models with recurrences [5,7,8] or convolutions [9,10] for neural machine translation. ...
... Looking at the whole story, the first notable moment in the evolution path of neural networks since over a half century ago until today was around a decade ago when AlexNet [11] the first deep convolutional neural network practically capable of solving the image classification problem end-to-end at a large scale was introduced. This sparked curiosity among other researchers to investigate the usefulness of deep neural network models for neural machine translation, an effort that led to the creation of encoder-decoder architectures [8]. Very soon, the encoder-decoder architecture became the defacto architecture in neural networks employed for neural machine translation. ...
... When PIV is used inside the decoder, Eq. (7) is infused into Eq. (8). When used inside the encoder, where there is no masking, Eq.(6) is injected into Eq. ...
Preprint
Full-text available
Recently the use of self-attention has yielded to state-of-the-art results in vision-language tasks such as image captioning as well as natural language understanding and generation (NLU and NLG) tasks and computer vision tasks such as image classification. This is since self-attention maps the internal interactions among the elements of input source and target sequences. Although self-attention successfully calculates the attention values and maps the relationships among the elements of input source and target sequence, yet there is no mechanism to control the intensity of attention. In real world, when communicating with each other face to face or vocally, we tend to express different visual and linguistic context with various amounts of intensity. Some words might carry (be spoken with) more stress and weight indicating the importance of that word in the context of the whole sentence. Based on this intuition, we propose Zoneout Dropout Injection Attention Calculation (ZoDIAC) in which the intensities of attention values in the elements of the input sequence are calculated with respect to the context of the elements of input sequence. The results of our experiments reveal that employing ZoDIAC leads to better performance in comparison with the self-attention module in the Transformer model. The ultimate goal is to find out if we could modify self-attention module in the Transformer model with a method that is potentially extensible to other models that leverage on self-attention at their core. Our findings suggest that this particular goal deserves further attention and investigation by the research community. The code for ZoDIAC is available on www.github.com/zanyarz/zodiac .
... Bahdanau et al. [1] pioneered the use of the attention mechanism for NMT along with RNN. Sutskever et al. [8] and Luong et al. [9] further advanced the implementation of the attention mechanism in NMT. After the introduction of the attention mechanism, a target token no longer depends only on the same context vector. ...
... if mask is not None then: (11) mmask � (−1e + 9) * (1 − mask) (12) a_ij ⟵ K.Add([a_ij,mmask]) (13) end if (14) a_ij ← K.expand dims(a_ij, axis � 1) (15) attn.append(a_ij) (16) end for (17) 8 Computational Intelligence and Neuroscience model cpt files can also be converted into PyTorch bin files with transformers [26]. All the experiments were completed by two NVIDIA Tesla V100 GPUs of 32 GB memory. ...
Article
Full-text available
Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, the computations of each head attention are conducted in the same subspace, without considering the different subspaces of all the tokens. On the other hand, the low-rank bottleneck may occur, when the number of heads surpasses a threshold. To address the low-rank bottleneck, the two mainstream methods make the head size equal to the sequence length and complicate the distribution of self-attention heads. However, these methods are challenged by the variable sequence length in the corpus and the sheer number of parameters to be learned. Therefore, this paper proposes the interacting-head attention mechanism, which induces deeper and wider interactions across the attention heads by low-dimension computations in different subspaces of all the tokens, and chooses the appropriate number of heads to avoid low-rank bottleneck. The proposed model was tested on machine translation tasks of IWSLT2016 DE-EN, WMT17 EN-DE, and WMT17 EN-CS. Compared to the original multihead attention, our model improved the performance by 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi and 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi on the evaluation set and the test set, respectively, for IWSLT2016 DE-EN, 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi and 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-DE, and 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi and 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-CS.
... Deep learning methods have made great progress in machine translation a generation. In classical Chinese poetry generation, a language model based on a re neural network (RNN) [3,4] can learn poetry's structure, semantics, and c constraints without additional manual rules concerning limitations of rhyming an However, the RNN-based model has difficulties in generating long sentences beca gradient vanishing problem restricts the RNN to operate only on a short-term m Attention Based Sequence-to-Sequence Model [5,6] was also introduced to study c Chinese poetry generation. The attention and long-short term memory mechanism [7] in sequence-to-sequence models facilitates semantic consistency thro the generated poetry, but it is still difficult for the model to catch effective lo relationships because the context information of the RNN encoder decays as the ti increases. ...
... Deep learning methods have made great progress in machine translation and text generation. In classical Chinese poetry generation, a language model based on a recursive neural network (RNN) [3,4] can learn poetry's structure, semantics, and coherent constraints without additional manual rules concerning limitations of rhyming and tone. However, the RNN-based model has difficulties in generating long sentences because the gradient vanishing problem restricts the RNN to operate only on a short-term memory. ...
Article
Full-text available
The computer generation of poetry has been studied for more than a decade. Generating poetry on a human level is still a great challenge for the computer-generation process. We present a novel Transformer-XL based on a classical Chinese poetry model that employs a multi-head self-attention mechanism to capture the deeper multiple relationships among Chinese characters. Furthermore, we utilized the segment-level recurrence mechanism to learn longer-term dependency and overcome the context fragmentation problem. To automatically assess the quality of the generated poems, we also built a novel automatic evaluation model that contains a BERT-based module for checking the fluency of sentences and a tone-checker module to evaluate the tone pattern of poems. The poems generated using our model obtained an average score of 9.7 for fluency and 10.0 for tone pattern. Moreover, we visualized the attention mechanism, and it showed that our model learned the tone-pattern rules. All experiment results demonstrate that our poetry generation model can generate high-quality poems.
... Two methods for representation learning of Hi-C data have previously been developed, SNIPER 11 and SCI 12 . SNIPER uses a fully connected autoencoder 15 to transform the sparse Hi-C interchromosomal matrix into a dense one row-wise, the bottleneck of which is assigned as the representation for the corresponding row. SCI 12 treats the Hi-C matrix as a graph and performs graph embedding 16 , aiming to preserve the local and the global structures to form representations for each node. ...
... Pseudo-bulk scHi-C, where many cells are clustered into groups of similar types and pooled in silico, allows for the statistical validation of chromatin patterns. We trained and evaluated Hi-C-LSTM on a subset of chromosomes (15)(16)(17)(18)(19)(20)(21)(22) using pseudo-bulk scHi-C data from Ramani et al. 88 (see the "Methods" section for details). A representative heatmap from chromosome 21 shows that Hi-C-LSTM is able to reconstruct contacts faithfully ( Supplementary Fig. 11), however, the sparsity of scHi-C data might be a potential concern when using the representations from scHi-C models for other downstream tasks like classification and in-silico manipulation. ...
Article
Full-text available
Despite the availability of chromatin conformation capture experiments, discerning the relationship between the 1D genome and 3D conformation remains a challenge, which limits our understanding of their affect on gene expression and disease. We propose Hi-C-LSTM, a method that produces low-dimensional latent representations that summarize intra-chromosomal Hi-C contacts via a recurrent long short-term memory neural network model. We find that these representations contain all the information needed to recreate the observed Hi-C matrix with high accuracy, outperforming existing methods. These representations enable the identification of a variety of conformation-defining genomic elements, including nuclear compartments and conformation-related transcription factors. They furthermore enable in-silico perturbation experiments that measure the influence of cis-regulatory elements on conformation.
... With the advent of methods based on neural networks, and more specifically, sequence models such as (Cho et al. 2014;Sutskever, Vinyals, and Le 2014;Vaswani et al. 2017), the automatic correction of texts using sequence models witnessed considerable progress in the form of neural sequence models based on characters or words (Rijhwani, Anastasopoulos, and Neubig 2020;Schnober et al. 2016). ...
... The process of training the models is the standard sequenceto-sequence pipeline that uses cross entropy loss to make the model generate the right token at every step, as it was proposed in (Cho et al. 2014;Sutskever, Vinyals, and Le 2014). All the models were trained using 4 CPU cores, 4 GB of RAM, and a single GPU NVIDIA V100 with 16 GB of memory. ...
Article
Full-text available
In this paper, we propose a novel method to extend sequence-to-sequence models to accurately process sequences much longer than the ones used during training while being sample- and resource-efficient, supported by thorough experimentation. To investigate the effectiveness of our method, we apply it to the task of correcting documents already processed with Optical Character Recognition (OCR) systems using sequence-to-sequence models based on characters. We test our method on nine languages of the ICDAR 2019 competition on post-OCR text correction and achieve a new state-of-the-art performance in five of them. The strategy with the best performance involves splitting the input document in character n-grams and combining their individual corrections into the final output using a voting scheme that is equivalent to an ensemble of a large number of sequence models. We further investigate how to weigh the contributions from each one of the members of this ensemble. Our code for post-OCR correction is shared at https://github.com/jarobyte91/post_ocr_correction.
... In this novel approach to SH, the effort of producing such SH grammar is lifted from the shoulders of the developers and instead delegated to the NN which infers it from the behavior observed from some BF. The task of SH is reduced to a "sequence-to-sequence" translation task [36], i.e., from { } to {ℎ }. ...
... Although base RNNs are no longer the state of the art in many translation applications, with current solutions mostly utilizing convolutional layers, the encoder-decoder architectures, or relying on the attention mechanism [4,5,14,21,36,42], these still offer a lightweight model compared to more recent techniques. Moreover, as it is later shown, the number of well-formed structural features NN are expected to infer from the SH oracle samples is small. ...
Preprint
With the presence of online collaborative tools for software developers, source code is shared and consulted frequently, from code viewers to merge requests and code snippets. Typically, code highlighting quality in such scenarios is sacrificed in favor of system responsiveness. In these on-the-fly settings, performing a formal grammatical analysis of the source code is not only expensive, but also intractable for the many times the input is an invalid derivation of the language. Indeed, current popular highlighters heavily rely on a system of regular expressions, typically far from the specification of the language's lexer. Due to their complexity, regular expressions need to be periodically updated as more feedback is collected from the users and their design unwelcome the detection of more complex language formations. This paper delivers a deep learning-based approach suitable for on-the-fly grammatical code highlighting of correct and incorrect language derivations, such as code viewers and snippets. It focuses on alleviating the burden on the developers, who can reuse the language's parsing strategy to produce the desired highlighting specification. Moreover, this approach is compared to nowadays online syntax highlighting tools and formal methods in terms of accuracy and execution time, across different levels of grammatical coverage, for three mainstream programming languages. The results obtained show how the proposed approach can consistently achieve near-perfect accuracy in its predictions, thereby outperforming regular expression-based strategies.
... Regarding the non-task-oriented system, it is principally an open-domain dialogue system for entertainment, which aims to generate relevant responses. In open domain dialogue systems, the traditional seq2seq generation model [7] encodes the dialogue into a fixed vector of knowledge representations, and inputting the same questions will generate the same responses. In contrast, Transformerbased dialogue generation models [8] abandon the basic paradigm of using a circular recursive structure to encode sequences and use a self-attention mechanism to compute the hidden state of a sequence. ...
... Seq2seq [7], breaking through the traditional fixed-size input problem framework, converts an input sequence of indefinite length into an output sequence of indefinite length. It also incorporates a Global Attention mechanism [27][28] to notice more textual information; ...
Preprint
Full-text available
Currently end-to-end deep learning based open-domain dialogue systems remain black box models, making it easy to generate irrelevant contents with data-driven models. Specifically, latent variables are highly entangled with different semantics in the latent space due to the lack of priori knowledge to guide the training. To address this problem, this paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving mesoscopic scale feature disentanglement. Particularly, the model integrates the macro-level guided-category knowledge and micro-level open-domain dialogue data for the training, leveraging the priori knowledge into the latent space, which enables the model to disentangle the latent variables within the mesoscopic scale. Besides, we propose a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution. Finally, we validate our model on different datasets and experimentally demonstrate that our model is able to generate higher quality and more interpretable dialogues than other models.
... Guellil and Azouaou (2016) presented an approach for social media dialectal Arabic identification based on supervised methods, using a pre-built bilingual lexicon of 25,086 words proposed by Guellil and Faical (2017) and Azouaou and Guellil (2017). A different method is presented in Younes et al. (2018), where the authors present a sequence-to-sequence model for Tunisian Arabizi-Arabic characters transliteration (Sutskever et al., 2014). As in the case of (Younes et al., 2020), most research on automatic processing of Arabizi involves a preliminary phase of, i) corpus collection, ii) model training and testing. ...
Preprint
Full-text available
In this paper we present the final result of a project on Tunisian Arabic encoded in Arabizi, the Latin-based writing system for digital conversations. The project led to the creation of two integrated and independent resources: a corpus and a NLP tool created to annotate the former with various levels of linguistic information: word classification, transliteration, tokenization, POS-tagging, lemmatization. We discuss our choices in terms of computational and linguistic methodology and the strategies adopted to improve our results. We report on the experiments performed in order to outline our research path. Finally, we explain why we believe in the potential of these resources for both computational and linguistic researches. Keywords: Tunisian Arabizi, Annotated Corpus, Neural Network Architecture
... In recent years, neural network-based models have consistently delivered betterquality translations than those generated by phrase-based systems. Transformer-based [1] neural machine translation has achieved state-of-the-art performance in neural machine translation, and it outperforms recurrent neural network (RNN)-based models [2][3][4]. However, recent work [5][6][7] has shown that the Transformer may not learn the linguistic information to the greatest extent possible due to the characteristics of the model, especially in low-resource scenarios. ...
Article
Full-text available
Transformer-based neural machine translation (NMT) has achieved state-of-the-art performance in the NMT paradigm. This method assumes that the model can automatically learn linguistic knowledge (e.g., grammar and syntax) from the parallel corpus via an attention network. However, the attention network cannot capture the deep internal structure of a sentence. Therefore, it is natural to introduce some prior knowledge to guide the model. In this paper, factual relation information is introduced into NMT as prior knowledge, and a novel approach named Factual Relation Augmented (FRA) is proposed to guide the decoder in Transformer-based NMT. In the encoding procedure, a factual relation mask matrix is constructed to generate the factual relation representation for the source sentence, while in the decoding procedure an effective method is proposed to incorporate the factual relation representation and the original representation of the source sentence into the decoder. Positive results obtained in several different translation tasks indicate the effectiveness of the proposed approach.
... But, in this preliminary experience, we will study the opportunity to translate an aphasic corpus to its corrected counterpart. For that, we will train a sequence-to-sequence machine translation model, a kind which has been used widely in the literature of machine translation and other NLP applications (Sutskever et al., 2014;Zhang et al., 2015;Nguyen Le et al., 2017;Mao et al., 2020). We used the corpus we created, APHAFRESH, for training, tuning and for testing. ...
Conference Paper
Full-text available
More than 13 million people suffer a stroke each year. Aphasia is known as a language disorder usually caused by a stroke that damages a specific area of the brain that controls the expression and understanding of language. Aphasia is characterized by a disturbance of the linguistic code affecting encoding and/or decoding of the language. Our project aims to propose a method that helps a person suffering from aphasia to communicate better with those around him. For this, we will propose a machine translation capable of correcting aphasic errors and helping the patient to communicate more easily. To build such a system, we need a parallel corpus; to our knowledge, this corpus does not exist, especially for French. Therefore , the main challenge and the objective of this task is to build a parallel corpus composed of sentences with aphasic errors and their corresponding correction. We will show how we create a pseudo-aphasia corpus from real data, and then we will show the feasibility of our project to translate from aphasia data to natural language. The preliminary results show that the deep learning methods we used achieve correct translations corresponding to a BLEU of 38.6.
... Neural Program Synthesis (NPS) typically applied the standard encoder-decoder framework to automatically generate code from a specific input (Ling et al. 2016), in particular, Seq2Seq architecture (Sutskever et al. 2014) is mostly the elemental model. One challenge the baseline model applied to NPS is that it had not considered the underlying syntax of the target programming language. ...
Article
Full-text available
Program synthesis is the task of automatically generating programs from user intent, which is one of the central problems in automated software engineering. Recently many researchers use a neural network to learn the distribution over programs based on user intent (such as API and type name), known as neural program synthesis (NPS). The generated programs of NPS are highly dependent on user intent. However, it is difficult for users to provide an accurate and complete intent for the NPS model, which decreases the synthesis accuracy of NPS. Collective Intelligence (CI) is an emerging trend, which illustrates that collective wisdom surpasses individual wisdom. Inspired by CI techniques, we propose an automatic task-specific user intent merging framework for NPS named MerIt (Merge User Intent of Program Synthesis). The key point of our framework is that we propose an improved Unsupervised Ant Colony Optimization (UACO) algorithm to selectively merge effective intent from multiple developers, and design three selection strategies to guide the merge process. The experiments show that our approach is able to provide more adequate and efficient input for NPS and improve the synthesis accuracy. Besides, our evaluation shows that selectively merging knowledge from multiple developers could be a significant way of promoting automated software engineering.
... Due to the temporal memorization property, LSTM units are successfully used in encoder-decoder neural networks to map an input sequence to a target sequence, which is commonly known as the Seq2Seq auto-encoder network (Sutskever et al., 2014). The encoder transforms the input sequence into a fixed-length latent vector, by feeding each element of an input sequence one by one and updating the hidden states where the last hidden state is considered to contain the summarized information of the entire input sequence. ...
Article
Full-text available
Parcellation of whole brain tractograms is a critical step to study brain white matter structures and connectivity patterns. The existing methods based on supervised classification of streamlines into predefined streamline bundle types are not designed to explore sub-bundle structures, and methods with manually designed features are expensive to compute streamline-wise similarities. To resolve these issues, we propose a novel atlas-free method that learns a latent space using a deep recurrent auto-encoder trained in an unsupervised manner. The method efficiently embeds any length of streamlines to fixed-size feature vectors, named streamline embedding, for tractogram parcellation using non-parametric clustering in the latent space. The method was evaluated on the ISMRM 2015 tractography challenge dataset with discrimination of major bundles using clustering algorithms and streamline querying based on similarity, as well as real tractograms of 102 subjects Human Connectome Project. The learnt latent streamline and bundle representations open the possibility of quantitative studies of arbitrary granularity of sub-bundle structures using generic data mining techniques.
... Although the Deep Neural Networks can solve difficult learning tasks, it cannot map sequence to sequences. But use of LSTM can map input sequence to a vector of a fixed dimensionality [8]. The main disadvantage of the sequence to sequence model is that it produces a fixed length context vector. ...
Conference Paper
Full-text available
Timbre, Content, Rhythm and Prosody are four crucial aspects of speech. These aspects controls how person speaks. When two speaker utters same content, then other three aspects controls the differences in their speech. Converting any person voice to the targeted speaker voice by control over these three parameters is the the main work presented in this paper. Voice synthesized by text-to-speech system if trained on lesser amount of data produces robotic and foggy sound. Producing a new dataset for adding a new speaker on the system is costly. The proposed method is an end-to-end system based on multi-domain google's Wavenet auto-encoder with disentangled latent space and shared encoder trained on Nepali speech dataset. The use of an auto-encoder can remove noise from the audio. The encoder part of the autoencoder transforms audio into the latent space representation whereas decoder side decodes latent space representation back to the voice of the targeted speaker. The rhythm, prosody and timbre of targeted speaker's voice is modified artificially. The network is trained in unsupervised way to recover these modified aspects back to the original speech. Attention mechanism network is supposed to recover the timing of the speaker in order to match the prosody of the targeted speaker. The model is trained on 480 datasets from 17 speakers followed by training on 1500 datasets of single speakers extracted from youtube for 5000 epochs. The correlation of the synthesized audio with the recorded speech of targeted speaker is found to be 0.78. Also the evaluation of quality by mean opinion score results the score of 2.78. Increase in the size of dataset, clarity of recorded audio sample and increase in number of training epochs can further increase the naturalness of the converted speech.
... Wen et al. (2017) proposed a method for implementing all the functions of NLU, DST, Policy, and NLG modules by using neural networks, enabling the entire system to be trained. Lei et al. (2018) incorporated both a decoder for generating belief states (i.e., DST module) and a response-generation decoder (i.e., NLG module) into a sequence-tosequence model (Sutskever et al., 2014). Zhang et al. (2020a) also proposed a method for jointly optimizing a system that includes three decoders that respectively execute the functions of DST, Policy, and NLG. ...
Preprint
Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be differentiable. Through dialogue simulation and human evaluation on the MultiWOZ dataset, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules.
... In recent years, neural networks have become empirically successful in a wide range of supervised learning applications, such as computer vision (Krizhevsky, Sutskever, and Hinton 2012;Szegedy et al. 2015), speech recognition , natural language processing (Sutskever, Vinyals, and Le 2014) and computational paralinguistics . Standard implementations of training feed-forward neural networks for classification are based on gradient-based stochastic optimization, usually optimizing the empirical cross-entropy loss (Hinton 1989). ...
Article
Full-text available
When humans learn a new concept, they might ignore examples that they cannot make sense of at first, and only later focus on such examples, when they are more useful for learning. We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method. The generalized gradient is parameterized by a value that controls the sensitivity of the training process to harder training examples. We tested our method on several benchmark datasets. We propose, and corroborate in our experiments, that the optimal level of sensitivity to hard example is positively correlated with the depth of the network. Moreover, the test prediction error obtained by our method is generally lower than that of the vanilla cross-entropy gradient learner. We therefore conclude that tunable sensitivity can be helpful for neural network learning.
... Currently, sequence-to-sequence (Seq2Seq) [12,13] based on deep neural networks provide an end-to-end text generation framework for NLP, and it is widely used in headline generation [14], speech recognition [15], machine translation (MT) [16] and other fields. Most automatic text summarisation approaches consider abstractive summarisation as a MT task, thus the encoder-decoder framework commonly used in MT is extensively used for abstractive summarisation task. ...
Article
Full-text available
The existing abstractive text summarisation models only consider the word sequence correlations between the source document and the reference summary, and the summary generated by models lacks the cover of the subject of source document due to models' small perspective. In order to make up these disadvantages, a multi‐domain attention pointer (MDA‐Pointer) abstractive summarisation model is proposed in this work. First, the model uses bidirectional long short‐term memory to encode, respectively, the word and sentence sequence of source document for obtaining the semantic representations at word and sentence level. Furthermore, the multi‐domain attention mechanism between the semantic representations and the summary word is established, and the proposed model can generate summary words under the proposed attention mechanism based on the words and sentences. Then, the words are extracted from the vocabulary or the original word sequences through the pointer network to form the summary, and the coverage mechanism is introduced, respectively, into word and sentence level to reduce the redundancy of summary content. Finally, experiment validation is conducted on CNN/Daily Mail dataset. ROUGE evaluation indexes of the model without and with the coverage mechanism are improved respectively, and the results verify the validation of model proposed by this paper.
... We applied a simple recurrent neural network (RNN), a memorable machine learning model [35], to reconstruct the risk profile for cancer-specific mortality (risk profile). Here, right-censoring was used for model development [36]. ...
Article
Full-text available
Background: Prognostication is essential to determine the risk profile of patients with urologic cancers. Methods: We utilized the SEER national cancer registry database with approximately 2 million patients diagnosed with urologic cancers (penile, testicular, prostate, bladder, ureter, and kidney). The cohort was randomly divided into the development set (90%) and the out-held test set (10%). Modeling algorithms and clinically relevant parameters were utilized for cancer-specific mortality prognosis. The model fitness for the survival estimation was assessed using the differences between the predicted and observed Kaplan-Meier estimates on the out-held test set. The overall concordance index (c-index) score estimated the discriminative accuracy of the survival model on the test set. A simulation study assessed the estimated minimum follow-up duration and time points with the risk stability. Results: We achieved a well-calibrated prognostic model with an overall c-index score of 0.800 (95% CI: 0.795-0.805) on the representative out-held test set. The simulation study revealed that the suggestions for the follow-up duration covered the minimum duration and differed by the tumor dissemination stages and affected organs. Time points with a high likelihood for risk stability were identifiable. Conclusions: A personalized temporal survival estimation is feasible using artificial intelligence and has potential application in clinical settings, including surveillance management.
... D EEP learning is an artificial intelligence (AI) technique that is regarded as one of the top technological breakthroughs in computer science [1], [2]. It imitates the working principle of the human brain in processing data and forming knowledge patterns and produces promising results for various tasks such as image classification [3], [4], speech recognition [5], recommendation [6], and natural language understanding [7], [8]. Nowadays deep learning is increasingly applied in many fields where trustworthiness is critically needed, such as cybersecurity [9], autonomous driving [10], and the healthcare industry [11]. ...
Preprint
Neural networks have been widely applied in security applications such as spam and phishing detection, intrusion prevention, and malware detection. This black-box method, however, often has uncertainty and poor explainability in applications. Furthermore, neural networks themselves are often vulnerable to adversarial attacks. For those reasons, there is a high demand for trustworthy and rigorous methods to verify the robustness of neural network models. Adversarial robustness, which concerns the reliability of a neural network when dealing with maliciously manipulated inputs, is one of the hottest topics in security and machine learning. In this work, we survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies. We also demonstrate representative techniques that have been applied in existing studies with a sample model. Finally, we discuss open questions for future research.
... In the prediction, at least three factors would affect each agent's dynamics: ego momentum, instantaneous intent, and social influence from the other agents. The first factor has been well studied [9]; the second factor is unpredictable; and the third factor is an emerging research topic and the focus of this work. To demystify the social influence, we need to model and reason the interactions among agents based on their past spatio-temporal states, potentially leading to precise and interpretable trajectory prediction. ...
Preprint
Full-text available
Demystifying the interactions among multiple agents from their past trajectories is fundamental to precise and interpretable trajectory prediction. However, previous works mainly consider static, pair-wise interactions with limited relational reasoning. To promote more comprehensive interaction modeling and relational reasoning, we propose DynGroupNet, a dynamic-group-aware network, which can i) model time-varying interactions in highly dynamic scenes; ii) capture both pair-wise and group-wise interactions; and iii) reason both interaction strength and category without direct supervision. Based on DynGroupNet, we further design a prediction system to forecast socially plausible trajectories with dynamic relational reasoning. The proposed prediction system leverages the Gaussian mixture model, multiple sampling and prediction refinement to promote prediction diversity, training stability and trajectory smoothness, respectively. Extensive experiments show that: 1)DynGroupNet can capture time-varying group behaviors, infer time-varying interaction category and interaction strength during trajectory prediction without any relation supervision on physical simulation datasets; 2)DynGroupNet outperforms the state-of-the-art trajectory prediction methods by a significant improvement of 22.6%/28.0%, 26.9%/34.9%, 5.1%/13.0% in ADE/FDE on the NBA, NFL Football and SDD datasets and achieve the state-of-the-art performance on the ETH-UCY dataset.
... The transformer-based approach popularized by Vaswani et al. (2017) is a particular instance of sequence-to-sequence (Seq2Seq) recurrent neural bipartite encoder-decoder architecture (Cho et al., 2014;Sutskever et al., 2014) equipped with multi-head attention mechanism. It has the advantages inherent to Seq2Seq models. ...
Conference Paper
Full-text available
The SIGTYP 2022 shared task concerns the problem of word reflex generation in a target language, given cognate words from a subset of related languages. We present two systems to tackle this problem, covering two very different modeling approaches. The first model extends transformer-based encoder-decoder sequence-to-sequence modeling, by encoding all available input cognates in parallel , and having the decoder attend to the resulting joint representation during inference. The second approach takes inspiration from the field of image restoration, where models are tasked with recovering pixels in an image that have been masked out. For reflex generation, the missing reflexes are treated as "masked pix-els" in an "image" which is a representation of an entire cognate set across a language family. As in the image restoration case, cognate restoration is performed with a convolutional network.
... This is because most of the existing networks can be regarded as the composition of multiple basic units, even for those with complex structures, e.g., seq2seq(Sutskever et al., 2014), R-Net(Wang et al., 2017), Resnet(He et al., 2016) and Densenet(Huang et al., 2017). Further, complex structures give us more freedom to decide which part of the parameters should be generated. ...
Conference Paper
Full-text available
Learning multiple tasks sequentially is important for the development of AI and lifelong learning systems. However, standard neural network architectures suffer from catastrophic forgetting which makes it difficult for them to learn a sequence of tasks. Several continual learning methods have been proposed to address the problem. In this paper, we propose a very different approach, called Parameter Generation and Model Adaptation (PGMA), to dealing with the problem. The proposed approach learns to build a model, called the solver, with two sets of parameters. The first set is shared by all tasks learned so far and the second set is dynamically generated to adapt the solver to suit each test example in order to classify it. Extensive experiments have been carried out to demonstrate the effectiveness of the proposed approach.
... RNN has memories to process a sequence of inputs, and hence has strong capabilities for the time series tasks. A typical RNN-based model, namely, the seq2seq model, is suited to dealing with the sequence-based "N inputs and K outputs" tasks (N = K) and learn mappings between unequal sequences due to its specific architecture [18,19]. Thus, we adopted the seq2seq model to perform our RLL decoding task. ...
Article
Full-text available
Unmanned aerial vehicles (UAVs) equipped with visible light communication (VLC) technology can simultaneously offer flexible communications and illumination to service ground users. Since a poor UAV working environment increases interference sent to the VLC link, there is a pressing need to further ensure reliable data communications. Run-length limited (RLL) codes are commonly utilized to ensure reliable data transmission and flicker-free perception in VLC technology. Conventional RLL decoding methods depend upon look-up tables, which can be prone to erroneous transmissions. This paper proposes a novel recurrent neural network (RNN)-based decoder for RLL codes that uses sequence to sequence (seq2seq) models. With a well-trained model, the decoder has a significant performance advantage over the look-up table method, and it can approach the bit error rate of maximum a posteriori (MAP) criterion-based decoding. Moreover, the decoder is use to deal with multiple frames simultaneously, such that the totality of RLL-coded frames can be decoded by only one-shot decoding within one time slot, which is able to enhance the system throughput. This shows our decoder’s great potential for practical UAV applications with VLC technology.
... Weiterführende Erläuterungen zum hier verwendeten Encoder-Decoder können [5,6,14] entnommen werden. ...
Conference Paper
Full-text available
The price of a commodity, as electricity, is determined on a commodity market. A market is efficient when the supply and demand in the market are at an equilibrium. Efficient markets run on information. Information can cause a spontaneous and instantaneous change within the supply and demand in a market. The market communicates this new equilibrium through the change of the price of a commodity. In the electricity market the supplier and consumer communicate through electrical load profiles. A load profile signals when and how much energy should be consumed within a certain time frame without causing a change in the price of elec-tricity. Creating such load profiles is commonly done by the supplier of energy by means of standard load profiles. Here we propose a data-driven simulation-based method that allows for the consumer to create its own specific load profile, which potentially will bring down the cost of energy consumed.
... This is the architecture of Encoder-Decoder as Figure 4 and the length of the input and output is variable. The related applications, including Google machine translation (Sutskever et al., 2014) and Google speech recognition (Prabhavalkar et al., 2017), are very versatile. ...
Article
Full-text available
The manufacturing process is defined by the synchronous matching and mutual support of the event logic and the task context, so that the work task can be completed perfectly, by executing each step of the manufacturing process. However, during the manufacturing process of the traditional production environment, on-site personnel are often faced with the situation that on-site advice is required, due to a lack of experience or knowledge. Therefore, the function of the manufacturing process should be more closely connected with the workers and tasks. To improve the manufacturing efficiency and reduce the error rate, this research proposes a set of manufacturing work knowledge frameworks, to integrate the intelligent assisted learning system into the manufacturing process. Through Augmented Reality (AR) technology, object recognition technology is used to identify the components within the line of sight, and the assembly steps are presented visually. During the manufacturing process, the system can still feedback to the user in animation, so as to achieve the function equivalent to on-the-spot guidance and assistance when a particular problem is solved by a specialist. Research experiments show that the operation of this intelligent assisted learning interface can more quickly recognize how the manufacturing process works and can solve problems, which greatly resolves the issue of personnel with insufficient experience and knowledge.
... Sequence-to-sequence (Seq2Seq) models were motivated due to problems requiring sequences of unknown lengths (Sutskever et al. 2014). Although they were initially applied to machine translation, they can be applied to many different applications involving sequence modelling. ...
Article
Full-text available
Deep reinforcement learning (DRL) is poised to revolutionise the field of artificial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to effectively solve various intractable problems in various fields including computer vision, natural language processing, healthcare, robotics, to name a few. Most importantly, DRL algorithms are also being employed in audio signal processing to learn directly from speech, music and other sound signals in order to create audio-based autonomous systems that have many promising applications in the real world. In this article, we conduct a comprehensive survey on the progress of DRL in the audio domain by bringing together research studies across different but related areas in speech and music. We begin with an introduction to the general field of DL and reinforcement learning (RL), then progress to the main DRL methods and their applications in the audio domain. We conclude by presenting important challenges faced by audio-based DRL agents and by highlighting open areas for future research and investigation. The findings of this paper will guide researchers interested in DRL for the audio domain.
... In contrast, the DRL framework does not require an explicit distance matrix, and only one feed-forward pass of the network will update the routes based on the new data generated by environmental interactions. Vinyals et al. [13] improved the sequence-to-sequence (Seq2Seq) model [14] and proposed a pointer network with an long short-term memory (LSTM) network as the encoder and an AM as the decoder. It can effectively solve small-scale TSP. ...
Article
Full-text available
Recent research has showen that deep reinforcement learning (DRL) can be used to design better heuristics for the traveling salesman problem (TSP) on the small scale, but does not do well when generalized to large instances. In order to improve the generalization ability of the model when the nodes change from small to large, we propose a dynamic graph Conv-LSTM model (DGCM) to the solve large-scale TSP. The noted feature of our model is the use of a dynamic encoder-decoder architecture and a convolution long short-term memory network, which enable the model to capture the topological structure of the graph dynamically, as well as the potential relationships between nodes. In addition, we propose a dynamic positional encoding layer in the DGCM, which can improve the quality of solutions by providing more location information. The experimental results show that the performance of the DGCM on the large-scale TSP surpasses the state-of-the-art DRL-based methods and yields good performance when generalized to real-world datasets. Moreover, our model compares favorably to heuristic algorithms and professional solvers in terms of computational time.
... Our model is based on Tacotron 2 [14], i.e., it is sequence-tosequence and attention-based [15], and consists of 3 main parts: an encoder, a decoder and an attention mechanism (Figure 1). ...
Preprint
Training multilingual Neural Text-To-Speech (NTTS) models using only monolingual corpora has emerged as a popular way for building voice cloning based Polyglot NTTS systems. In order to train these models, it is essential to understand how the composition of the training corpora affects the quality of multilingual speech synthesis. In this context, it is common to hear questions such as "Would including more Spanish data help my Italian synthesis, given the closeness of both languages?". Unfortunately, we found existing literature on the topic lacking in completeness in this regard. In the present work, we conduct an extensive ablation study aimed at understanding how various factors of the training corpora, such as language family affiliation, gender composition, and the number of speakers, contribute to the quality of Polyglot synthesis. Our findings include the observation that female speaker data are preferred in most scenarios, and that it is not always beneficial to have more speakers from the target language variant in the training corpus. The findings herein are informative for the process of data procurement and corpora building.
... The model is based on an autoregressive neural network [44,45] (the autoregressive recurrent network architecture). The model distribution Q Θ z i,t 0 :T | z i,1:t 0 −1 , x i,1:T consists of multiple likelihood multipliers, as in (16): ...
Article
Full-text available
Aero-engine casing is a kind of thin-walled rotary part for which serious deformation often occurs during its machining process. As deformation force is an important physical quantity associated with deformation, the utilization of deformation force to control the deformation has been suggested. However, due to the complex machining characteristics of an aero-engine casing, obtaining a stable and reliable deformation force can be quite difficult. To address this issue, this paper proposes a deformation force monitoring method via a pre-support force probabilistic decision model based on deep autoregressive neural network and Kalman filter, for which a set of sophisticated clamping devices with force sensors are specifically developed. In the proposed method, the pre-support force is determined by the predicted value of the deformation force and the equivalent flexibility of the part, while the measurement errors and the reality gaps are reduced by Kalman filter via fusing the predicted and measured data. Both computer simulation and physical machining experiments are carried out and their results give a positive confirmation on the effectiveness of the proposed method. The results are as follows. In the simulation experiments, when the confidence is 84.1%, the success rate of deformation force monitoring is increased by about 30% compared with the traditional approach, and the final impact of clamping deformation of the proposed method is less than 0.003 mm. In the real machining experiments, the results show that the calculation error of deformation by the proposed method based on monitoring the deformation force is less than 0.008 mm.
... Since the invention of transformers (Vaswani et al., 2017), Neural Machine Translation (NMT) has become state of the art and the latest approach in machine translation techniques (Bahdanau et al., 2014) (Jean et al., 2014) (Sutskever et al., 2014) (Kalchbrenner and Blunsom, 2013). NMT has shown promising results compared to traditional approaches such as Statistical Machine Translation (SMT) (Philipp et al., 2003). ...
Preprint
Indonesian is an agglutinative language since it has a compounding process of word-formation. Therefore, the translation model of this language requires a mechanism that is even lower than the word level, referred to as the sub-word level. This compounding process leads to a rare word problem since the number of vocabulary explodes. We propose a strategy to address the unique word problem of the neural machine translation (NMT) system, which uses Indonesian as a pair language. Our approach uses a rule-based method to transform a word into its roots and accompanied affixes to retain its meaning and context. Using a rule-based algorithm has more advantages: it does not require corpus data but only applies the standard Indonesian rules. Our experiments confirm that this method is practical. It reduces the number of vocabulary significantly up to 57\%, and on the English to Indonesian translation, this strategy provides an improvement of up to 5 BLEU points over a similar NMT system that does not use this technique.
... 35 Data-driven methods, and specifically deep learning, can provide a viable solution to the challenges of pushover PSDM mapping. Deep learning is a subfield of artificial intelligence that is used to perform complex mappings through representation learning, such as image-toimage [31] and sequence-to-sequence [32] translations. Among different modeling techniques, encoder-decoder models are prevalent in sequence-to-sequence translation, where the encoder 40 encodes the first dimension while the decoder reconstructs the target dimension from encoded features. ...
Article
Full-text available
Probabilistic seismic demand analysis (PSDA) is the most time-and effort-intensive step in risk-based assessment of the built environment. A typical PSDA requires subjecting the structure to a large number of ground motions and performing nonlinear dynamic analysis, where the analysis dimension and effort substantially increase at large-scale assessments such as community-level evaluations. This study presents a deep learning framework to estimate seismic demand models from nonlinear static (i.e., pushover) analysis, which is computationally inexpensive. The proposed architecture leverages an encoder-decoder model with customized training schedules and a loss function capable of determining demand model parameters and error. Furthermore, the framework facilitates the seamless incorporation of structural modeling uncertainties in PSDA. The proposed framework is then applied to a building inventory consisting of 720 concrete frames to examine its generalizability and accuracy. The results show that the deep learning architecture can estimate demand models by an R 2 of 84% using a test-to-train ratio of unity. In addition, the average prediction error is less than 3% and 6% for demand model slope and intercept parameters , respectively, translating into an accurate estimation of fragility functions with a median error of 5.7%, 6.9%, and 6.8% for immediate occupancy, life safety, and collapse prevention damage states. Lastly, the framework can efficiently propagate structural uncertainties into seismic demand models, capturing the implicit relationship of the frames' nonlinear characteristics and resultant fragility functions.
... High capacity deep learning models have excelled at data-to-text generation (Wiseman et al., 2017;Parikh et al., 2020), summarization (See et al., 2017), and taskoriented dialogue modeling (Wen et al., 2018). Recent approaches have shown great promise thanks to neural encoder-decoder models (Bahdanau et al., 2014;Sutskever et al., 2014), Transformer-based architectures (Vaswani et al., 2017), and large-scale pretraining (Liu and Lapata, 2019b;Zhang et al., 2020;. In the canonical formulation of conditional generation, source content (e.g., a document, or dialogue history) is encoded with a neural architecture, while the decoder autoregressively produces a token at each output position based on its internal state, attention mechanisms (Bahdanau et al., 2014;Luong et al., 2015). ...
Preprint
Full-text available
The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e.,~what to say) and planning (i.e.,~in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.
... Specifically, the natural language understanding module aims to classify the user's intentions, and then the dialogue state tracking module can track the current state and fill in the predefined slots. Thereafter, the policy learning module predicts the following action on the basis 2. https://multimodaldialog.wixsite.com/website. of the current state representation, and the natural language generation module returns the response through generation methods [13], [14], [15] or predefined templates. Despite the remarkable success of the pipeline-based methods, they are prone to suffer from error propagation [16] and heavy dependence on the sequential modules [17]. ...
Preprint
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
Article
Context Source code summarization is a crucial yet far from settled task for describing structured code snippets in natural language. High-quality code summaries could effectively facilitate program comprehension and software maintenance. A good code summary is supposed to have the following characteristics: complete information, correct meaning, and consistent description. In recent years, numerous approaches have been proposed for code summarization, but it is still very challenging for developers to automatically learn the complex semantics from the source code and generate complete, correct and consistent code summaries. Objective In this paper, we propose KGCodeSum, a novel keyword-guided abstractive code summarization approach that incorporates structural and contextual information. Methods To improve summaries’ quality, we leverage both the structural information embedded in code itself and the contextual information from related code snippets. Meanwhile, we make use of keywords to guide summaries’ generation to guarantee the code summaries contain key information. Finally, we propose a new dynamic vocabulary strategy which can effectively resolve the UNK problems in code summaries. Results Through our evaluation on the large-scale benchmark datasets with 2.1 million java method-comment pairs and 1.1 million C/C++ function-summary pairs, We have observed that our approach could generate better code summaries than existing state-of-the-art approaches in terms of completeness, correctness and consistency. In addition, we also find that incorporating the dynamic vocabulary strategy into our approach could significantly save time and space in the model training process. Conclusion Our KGCodeSum approach could effectively generate code summaries.
Article
Multi-turn conversations commonly face the challenges of adequately expressing emotions and enriching or expanding the content of the dialogues. The existing knowledge-based dialogue model can not avoid the problem of too rational response or ignoring appropriate expression of emotion. However, the precise emotions of one conversational bot are crucial to improve user satisfaction. In our work, we design an auto-emotion prediction for a chatting machine that considers six common fine-grained emotions and automatically assigns corresponding emotion for responses according to historical emotion lines and contexts. Moreover, we creatively fuse the multi-forms of knowledge dynamically to improve the response information, which can take advantage of both abundant unstructured latent knowledge in the documents and the information expansion capabilities of the structured knowledge graph. Notably, in the aspect of unstructured knowledge, a novel virtual knowledge base is constructed and a new delay updating algorithm is proposed. We present a new dialogue generation model, Automatically predicting emotion based dynamic multi-form knowledge fusion Conversation Generation (Apeak-CG). Both automatic and human evaluation results indicate the effectiveness of our method in terms of emotional coherence and informativeness.
Article
Text-to-Speech Synthesis (TTS) is an active area of research to generate synthetic speech from underlying text. The identified syllables are uttered with proper duration and prosody characteristics to emulate natural speech. It falls under the category of Natural Language Processing (NLP), which aims to bridge the gap in communication between human and machine. So far as Western languages like English are concerned, the research to produce intelligent and natural synthetic speech has advanced considerably. But in a multilingual state like India, many regional languages viz. Malayalam is underexplored when it comes to NLP. In this article, we try to amalgamate the major research works performed in the area of TTS in English and the prominent Indian languages, with a special emphasis on the South Indian language, Malayalam. This review intends to provide right direction to the research activities in the language, in the area of TTS.
Article
Full-text available
Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen’s κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.
Preprint
Full-text available
Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.
Article
A language modeling overview, highlighting basic concepts, intuitive explanations, technical achievements, and fundamental challenges.
Article
Resource management of deep learning training (DLT) jobs is critical for cluster resource efficiency and client QoS assurance. Most existing scheduling frameworks require clients to specify job resource configuration, which can lead to over-provision or under-provision issues. Additionally, the performance of some static scheduling frameworks degrades in highly dynamic clusters. In this paper, we propose a QoS-aware joint resource optimization framework called Qore-DL for distributed DLT jobs. We divide the lifecycle of a DLT job into submission, queuing and running stages. Qore-DL automatically configures reasonable resources for submitted jobs and greedily assigns scheduled jobs to hosts. For running jobs, Qore-DL employs a heuristic scheme to adjust their resources. Qore-DL jointly considers the optimization of QoS satisfaction and resource efficiency at the three stages of DLT jobs. We implemented the prototype of Qore-DL in TensorFlow based on Kubernetes and conducted extensive experiments in CPU and GPU clusters to evaluate its performance. The experiment results show that, compared with its counterparts, Qore-DL can improve the job completion rate by up to 42.4% and the cluster resource efficiency by up to 21.8%.
Conference Paper
Due to the continuous increase of information and data, it has been proven that Automatic Speech Recognition (ASR) systems are more efficient and less expensive when it comes to a variety of important tasks, such as customer relationship management. However, the most complex and accurate speech recognition models are developed and implemented for languages in which data is highly available, such as English and French. This document proposes an automatic speech recognition system for the Moroccan dialect, a very low-resource language, that is spoken by almost every Moroccan citizen and adopted in many organizations that are both public and private. The proposed solution is based on a state-of-the-art architecture, named Deep Speech 2 by Baidu. We tested the model on 24 hours of speech and obtained 22.7% word error rate and 6.03% character error rate.
Article
Traffic flow prediction is an important part of the intelligent transportation system, and accurate traffic flow prediction can better perform the data management and control of the urban road network. Traditional methods often ignore the interaction between traffic flow factors and the spatiotemporal dependence of traffic networks. In this paper, a spatiotemporal multi-head graph attention network (ST-MGAT) is proposed for the traffic flow prediction task. In the input layer, multiple traffic flow variables are used as input to learn the nonlinearity and complexity existing in it. In terms of modeling, the structure of linear gating units is transformed using the full volume and applied to correlation learning in the temporal dimension. The spatial dimension features are then captured using a multi-head attention map network. Ablation experiments are designed to verify the use effect of each module. At the same time, several sets of parameter experiments are designed to obtain the best model parameters. The experimental results show that compared with the baselines, the RMSE of the ST-MGAT model is decreased by almost 2.340%∼22.471%, and the MAE is decreased by almost 7.417%∼26.142%. It is proved that the ST-MGAT model has high performance on the traffic flow prediction task.
Preprint
Full-text available
Training an ensemble of different sub-models has empirically proven to be an effective strategy to improve deep neural networks' adversarial robustness. Current ensemble training methods for image recognition usually encode the image labels by one-hot vectors, which neglect dependency relationships between the labels. Here we propose a novel adversarial ensemble training approach to jointly learn the label dependencies and the member models. Our approach adaptively exploits the learned label dependencies to promote the diversity of the member models. We test our approach on widely used datasets MNIST, FasionMNIST and CIFAR-10. Results show that our approach is more robust against black-box attacks compared with the state-of-the-art methods. Our code is available at https://github.com/ZJLAB-AMMI/LSD.
ResearchGate has not been able to resolve any references for this publication.