Conference PaperPDF Available

Machine translation using deep learning: An overview

Authors:

Figures

Content may be subject to copyright.
MACHINE TRANSLATION USING DEEP
LEARNING: AN OVERVIEW
Shashi Pal Singh*1, Ajai Kumar*2, Hemant Darbari*3, Lenali Singh*4 Anshika Rastogi #4, Shikha Jain #5
*AAI, Center for development of Advanced Computing, Pune, India
*1 shashis@cdac.in , *2 ajai@cdac.in, *3 darbari@cdac.in, *4lenali@cdac.in
#Banasthali Vidyapith, Banasthali, Rajasthan, India
#4 anshikarastogi1992@gmail.com , #5 shikhaj959@gmail.com
Abstract: This Paper reveals the information about
Deep Neural Network (DNN) and concept of deep
learning in field of natural language processing i.e.
machine translation. Now day’s DNN is playing major
role in machine leaning technics .Recursive recurrent
neural network (R2NN) is a best technic for machine
learning. It is the combination of recurrent neural
network and recursive neural network (such as
Recursive auto encoder). This paper presents how to
train the recurrent neural network for reordering for
source to target language by using Semi-supervised
learning methods.Word2vec tool is required to generate
word vectors of source language and Auto encoder
helps us in reconstruction of the vectors for target
language in tree structure. Results of word2vec play an
important role in word alignment of the input vectors.
RNN structure is very complicated and to train the
large data file on word2vec is also a time-consuming
task. Hence, a powerful hardware support (GPU) is
required. GPU improves the system performance by
decreasing training time period.
Keywords: Neural Network(NN), Deep neural
network(DNN), convolutional neural network(CNN),
feed-forward neural network(FNN), recurrent neural
network(RNN), recursive auto-encoder(RAE), Long
Short-term memory(LSTM).
I. INTRODUCTION
Deep Learning is a recently used approach for
machine translation. Unlike the traditional machine
translation, the neural machine translation is a better
choice for more accurate translation and it also
provides better performance. DNN can be used to
improve traditional systems in order to make them
more efficient.
Different deep learning techniques and libraries are
requiring for developing a better machine translation
system. RNN, LSTMs etc. are used to train the
system which will convert the sentence from source
language to target language. Adapting the suitable
networks and deep learning strategies is a good
choice because it tuned the system towards
maximizing the accuracy of the translation system as
compare to others.
A. Machine Translation
Machine translation is a method to convert the source
sentence from one natural language to other natural
language with the help of computerized systems and
human assistance is not necessary.
Different approaches are available to create such type
of systems but we require a more robust technique to
create better system than existing systems. A well-
trained network leads the system towards its goal,
which is to generate more efficient translation system
that is capable in providing good accuracy [8] [10].
B. Deep Learning
Deep learning is a new technique, widely use in
different machine learning applications. It enables the
system to learn like a human and to improve the
efficiency with training. Deep learning methods have
the capability of feature representation by using
supervised/unsupervised learning; even there exist
higher and more abstract layers. Deep learning
currently used in image applications, big data
analyses, speech recognition, machine translation etc.
[8].
C. Deep Neural Networks
Neural networks with more than one hidden layer are
known as deep neural networks (DNNs). These
networks first enter into the training phase then
implemented to solve the problem. The structure and
DNNs process of training depend upon the given
task.
Fig. 1. Training and implementation of neural networks
II. DEEP LEARNING IN MACHINE
TRANSLATION
Deep learning attracts researchers for using it in
machine translation. The main idea behind this is to
develop a system that works as translator. With the
help of history and past experiences, a trained deep
neural network translates the sentences without using
large database of rules.
Machine translation consists some other related
processes like word alignment, reordering rules,
language modeling etc. Each process in text
processing has appropriate DNN solutions as shown
in the table 1 [5].
TABLE.1 DNN IN MACHINE TRANSLATION
Text Processing
DNN Solutions
Word Alignment
FNN
RNN
Translation Rule
Selection
FNN
RAE
CNN
Reordering and
Structure
Prediction
FNN
RAE
CNN
Language Model
RAE
Recurrent NN
(LSTM ,GRU)
Recursive NN
Joint Translation
Prediction
FNN
RNN
CNN
III. DNN IN TRANSLATION PROCESS
After preprocessing (sentence segmentation,
tokenization etc.), translation process starts with
word alignment followed by reordering and language
modelling.
A. Word Alignment
In word alignment input to the system is parallel
sentence pair and the output is pair of words which
are most related to each other. Suppose, we have
source sentence S=s1, s2, sn and target sentence T=
t1, t2 …tn’, then A is the set that denotes the
correspondence of words between bilingual
sentences, A= {(i, j), 1<=i<=n, 1<=j<=n’}
Here, (i, j) denotes the pair (si, tj) which are
translation of each other.
Feed forward neural network (FNN) can be used for
word alignment task but it has been proven that
recurrent neural network (RNN) is better choice as it
maintains the history and predicts accurate next
alignment on the bases of previous history of
alignments (Ax based on previous history of
alignments A1x-1) [5].
we want to translate source text which consists
words, symbols, characters etc. A code or strategy is
requiring to convert words in vector form and that
conversion is based on words feature in that text.
Word embedding is key concept used in deep
learning for finding the vector value of words. Word
embedding is a continuous space vector
representation and it has capability to capture the
semantic and syntactic feature of corresponding
word. Large corpus is necessary for training, it can
capture information which is necessary for translation
purpose. The word vector is used as an input of deep
neural network. A popular tool word2vec is available
to generate the vector [5].
Various models (CBOW, Skip-gram) and algorithms
(Hierarchical softmax, negative sampling) work
behind in word2vec processing. Word2vec reduce the
dimensionality of word with the help of dimension
reduction technique.
Now each vector represented by fixed-dimension
vector in continuous space. If a word vector is
known, then we can easily find out all the vector of
the other words which are situated in same
dimensions [21].
Let us take an example, where V represent the
corresponding value of the word [as represented by
equation (1)].
Fig. 2. Word Representation in Continuous Space
We can use the word2vec in machine translation to
locate the vectors of words in corpus. If we have
English-Hindi training dataset then result should be
for which, we use a shallow neural network to
generate the vectors and an appropriate DNN to learn
these alignments. Fig.3 visualize the vector
representation more clearly [21].
We can easily find out the similarity among the
words with the help of dot product of their vector
values [22]. The cosine similarity can be calculated as
(2)
V[play] = V [coming] + V [come] V [playing] (1)
 
 


MAN
Uncl
e
Male-
Female
WOMAN
Aunt
Play
Playin
g
Verb-Tense
Come
Coming
Fig. 3. Word Representation
Similarity is very useful concept in case of rare
words. Suppose we have an alignment A for pair (q,
p), q belongs to S and p belongs to r, we want to find
the correspond word of r in target language then find
the nearest/similar word of r in S and find out the
most suitable word s in target T such that alignment
A’ is generated as (r, s), here, s is the required word
in target language [22].
RNN implementations for word alignment task not
only learns the bilingual word embedding but also
acquire the similarity between words and use the
wide contextual information very effectively.
B. Rule Selection and Reordering
Once alignment process is done, translation process
leads to rule selection/extraction phase. Here, rules
are selected/extracted on the basis of word alignment
and then reordering model is trained by word aligned
bilingual text. There is a problem in choosing right
target phrase/word due to language sparseness.
Source sentence may have different meanings. If we
have a rule R-> (S1, …., a, T1, …., b) then it first
employed to vector representation and then similarity
score is calculated to select the most suitable rule.
FNN can be used to optimize the score which leads
towards better translation but bilingual constrained
recursive auto-encoder outperform in this task
because it tries to minimize the reconstruction error
and minimize the semantic distance. The recursive
auto-encoder is trained with reordering examples that
are already generated from word-aligned bilingual
sentences. RAE is capable enough to capture
knowledge of phrase’s word order information [5].
Next step is reordering and predict the structure of
sentence. Combination of recursive neural network
and recurrent neural network (R2NN) is a good idea
to execute this. Two main concern here are 1) which
two candidates composed first, 2) in which order they
would be composed. To work with tree structure,
recursive neural network is the best choice but if we
use RNN with it then they integrate their capabilities
as RNN will maintain the history that will be useful
for language modelling and recursive neural network
will be useful to generate tree structure in bottom-up
fashion. Semi-supervised learning is used for
training. R2NN is a nonlinear combination [13].
C. Language Modelling
FNN can be used to learn this model in continuous
space. In this model, concatenation of word vectors is
fed to input and hidden layer to find the probability
of Tn based on T1n-1 [5].
Recurrent neural network can be designed for
language modelling because it performs very good in
sequence to sequence learning task. Here we give the
sequence of inputs (s1, …, sn) and on the basis of the
sequence, it will predict sequence of output (t1, ….,
tn’). Input vectors entered to the network one by one,
concatenate with previous history at hidden layers
and then output is calculated at each step [9].
RNN computation can be explain by the following
equations (3)
(4)
Two RNNs are required; one for encoding and
another for decoding process. If (S, T) be the source
and target sentence pair then s1, s2, … sn =Encoder
(s1, s2, …., sn) by using chain rule, condition
probability can be calculated as (5)
Decoder is the combination of recurrent neural
network and softmax layer [17].
It is difficult to train RNN due to long term
dependences. LSTM networks avoid the problems
occurred with RNN. It uses back propagation through
time algorithm to learn the model parameters. [9] [4]
D. Joint Translation
Joint language and translation model is used to
predict the target word with the help of unbounded
history of source and target words. RNN is the best
network for this. FNN and CNN only concern with
the learning using networks but RNN maintains the
sequence whether translation is generated left to right
or right to left [14].
IV. METHODOLOGY
It is difficult to train RNN for Word Alignment so an
alternative can be used in the form of bilingual
corpus. We have created English- Hindi bilingual
corpus that contain 1, 20,000 words with their feature
values. Hence, we can fetch Hindi meaning of given
word and can assign vector values to it, based on its
feature [as in Figure 6]. That is, vector of English
word and its corresponding Hindi word will be same
and after word alignment we can proceed for further
processing [as in Table 3].
P(S|T) =P (T|s1, s2, …., sn)
hn = sigm (Whssn + Whhhn−1)
Tn = Wthhn
Binary tree structure for source sentence is shown in
Fig. 4.
Fig. 4. Tagging and Parsing of English Sentence
Binary tree structure for target language is shown in
Fig. 5.
Fig. 5. Tagging and Parsing of Hindi Sentences
Fig. 6. Extract Information from Data
TABLE. 2 DATABASE TABLE
English
Hindi
Vectors
C-DAC
-
[0.123, 0.107…]
Was

[-0.043, 0.0105…]
Established
 
 
[-0.0123, 0.143…]
By

[-0.172, -0.231…]
the department

[-0.124, -0.342…]
Of

[-0.442, -0.342…]
Electronics

[-0.334, -0.344…]
TABLE. 3 RULES FOR WORD ALIGNMENT AND
REORDERING
Rules for Word
Alignment
R1
[C-DAC,  ]
R2
[was,   ]
R3
[ of,  ]
R4
[electronics,
]
R5
[the department, ]
R6
[by, ]
R7
[established, ]
Rules for
Reordering
R8
<R7, R6> Invert
R9
[R8, R5] Straight
R10
[R9, R4] Straight
R11
[R10, R3] Straight
R12
[R11, R2] Straight
R13
<R1, R12> Invert
Since the vector of “C-DAC” is [0.123, 0.107…] and
the vector of “was” is [-0.043, 0.0105…] then we
denote “C-DAC was” as parent and can find the
vector of this phrase by concatenation of both vector
values then multiply them with parameter matrix.
Pass this value to an activation function which is a
nonlinear function like tanh (.). If the vectors of
children are “n” dimensional then parent vector is
also “n” dimensional. Repeat the above process for
each level (Fig.7).
We can represent this whole processing with the help
of binary tree structure. Where auto encoder is used
at each node.
Here we set P =

(6)
Where [k1; k2] R 2n*1 and w (1) R n*2n and b (1) is a
bias which belongs to R n*1 and (1) is the element-
wise activation function that is tanh(.).Here k1 =


and k2 =


(7)
P = (1) (w (1) [k1; k2] + b (1))
k1  k2
Where k1 and k2 are the reconstructed children, w (2)
is a parameter matrix for reconstruction, b (2) is a bias
for reconstruction and (2) is element-wise activation
function.
Fig. 7. Implementation of Reordering Rules
In the binary tree set is in the triplet form (p k1 k2)
where p is the parent vector and k1 and k2 are the
children of the parent p.
In binary tree triplet’s set’s representation
represented as (v1 u1, u2), (v2 v1, u3) and (v3
v2, u4).
In recursive auto-encoder three steps are involved.
1. rec = parameter matrix of recursive auto-
encoder w1 and w2 and bias for both
languages source and target.
2. reo = parameter matrix w1 and w2 and the
bias b o.
3. w = word embedding matrix for both
languages source and target.
There are two types of error computing
1. Reconstruction error
2. Reordering error
Reconstruction error represents the vector space
representation corresponding string.
For measuring the reconstruction error, we use the
Euclidean distance between the input vector and
reconstructed vector.
(8)
For calculating the reconstruction error, we use the
greedy strategy.
Let us assume Erec ([u1; u2]; ), Erec ([u2; u3]; ),
Erec ([u3; u4]; ) are the sets of the sentence and
Erec ([u1; u2]) the smallest error compare to all then
the greedy strategy algorithm select it and replace u1
and u2 with their vector representation v1 generate
by the recursive auto encoder then the strategy
compute the Erec ([v1; u3]; ), Erec ([u2; u3]; ) and
Erec ([u3; u4]; ) and repeat these all above steps
when we get the only one vector remain.
Giving the training set S = {t i} where t i ={
}.
For find average reconstruction error:
(9)
Here 
 denotes the immediate nodes of
source side binary tree and Ns is the number of
occurring immediate nodes.
p.kn is the number of nth child vector of parent p. + E
rec, t (S; ) denotes the reconstruction error of target
side.
(10)
Reordering error represent the merging order with the
help of classifier prediction.
Given training set t i ={
}
(11)
Here d t is the probability of label. If Oi = straight
then the value of label is [1, 0] and Oi = inverted then
the value of label is [0, 1].
Where O {straight, inverted}, the reordering error
is
(12)
The joint training function is
(13)
Where using for giving the preference between the
reconstruction and recording error and
R () is regularizer.
(14)
We are using the greedy strategy for constructing the
binary tree to represent each phrase level and these
fixed binary tree’s derivatives are calculated by
backpropagation using structure.
RNN can be used with RAE in language modelling to
predict more accurate sequence of words [14].
V. PERFORMANCE IMPROVEMENT USING
GPUS
Deep learning application requires high computations
because there exists large matrix multiplication,
E rec, s (s; )
=
       


E rec, s (s; ) = E rec (S; ) + E rec, t (S; )
E rec ([k1; k2]; ) =
|| [k1; k2] [k1’; k2’] ||2
J = E rec (S; ) + (1-) E reo (S; ) + R ()
E reo (S; ) =
  
R () =
||||2 + 
||||2 + 
||||2
E k (; ) = - (O) log ( (O | C1, C2))
parallel processing and number of calculations during
training phase.
Graphics processor unit (GPU) is very good option
for parallel processing and fast computation as
compare to the CPU. We are using NVIDIA
GeoForce GTX TitanX to train word2vec for large
corpus (3GB wiki data). It can also be used in
training of recursive auto encode and recurrent neural
network. GPU not only provides better energy
efficiency but it also archives substantially higher
performance over CPUs [1] [12].
VI. CONCLUSION
In the present time, machine translation is a very hot
research topic in natural language processing area.
Deep learning helps to train a translation system like
a human brain. RNN, RAE provides better result in
text processing as compare to other neural networks.
Word alignment, reordering and language modeling
can be performed with the help of a well-trained deep
neural network. Word2vec generates the word-
vectors that are used by recurrent auto-encoder in
reconstruction task. RNN has the capability to
implement reordering rules on sentences. GPU solves
the problem of complex computation and leads the
system towards good performance because it
supports massive parallel computation.
VII. FUTURE WORK
Machine translation using deep learning is a good
idea but it is very far from perfection. There exists
lots of problems like lack of vocabulary, data
sparseness, maintain history of vector values etc. A
machine translation need very large corpus for it.
Problem of gradient decent is also encounter when
RNN is used, one solution is LSTM networks.
Working with deep LSTM is better choice to build a
more perfect translation system. Multiple GPUs can
be used to accelerate training process. By
implementing all these concepts, we will move
towards an optimized machine translation system.
REFERENCES
[1] Daniel Schlegel, “Deep Machine Learning on Gpu”,
University of Heidelber-Ziti, 12 January 2015.
[2] Holger Schwenk, Yoshua Bengio,” Learning Phrase
Representations Using Rnn Encoder Decoder for Statistical
Machine Translation”, Arxiv: V3 ,3 Sep 2014.
[3] Hugo Larochelle, Yoshua Bengio, Jerome Louradour, Pascal
Lamblin, “Exploring Strategies for Training Deep Neural
Networks”, Journal of Machine Learning Research 1 2009
Submitted 12/07.
[4] Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to
Sequence Learning with Neural Networks”, Google.
[5] Jiajun Zhang And Chengqing Zong, “Deep Neural Network
in Machine Translation”, Institute of Automation, Chinese
Academy of Sciences.
[6] Josep Crego, Jungi Kim, Guillaume Klein, Anable Rebollo,
“Systran’s Pure Neural Machine Translation System”,
Volume 1, 18 October 2016.
[7] Kyunghyun Cho, “From Sequence Modeling to Translation”,
Institut De Montréal Des Algorithmes D’apprentissage,
Département D’informatique Et De Recherche
Opérationnelle, Facult Des Arts Et Des Sciences, Université
De Montréal
[8] Li Deng And Dong Yu, “Deep Learning: Methods and
Applications”, Microsoft Research One Microsoft Way
Redmond, Wa 98052; Usa, Vol. 7, Nos. 34 (2013) 197387.
[9] Martin Sundermeyer, Ralf Schluter, And Hermann
Ney, “Lstm Neural Networks for Language
Modelling”, Huma Language Technology and Pattern
Recognition, Computer Science Department, Rwth
Aachen University, Aachen, Germany.
[10] Mohamed Amine Cheragui, “Theoretical Overview of
Machine Translation”, African University, Adrar, Algeria,
Icwit 2012.
[11] Rico Sennrich, “Neural Machine Translation”, Institute for
Language, Cognition and Computation University of
Edinburgh May 18 2016.
[12] Seulki Bae and Youngmin Yi, “Acceleration of Word2vec
Using Gpus”, School of Electrical and Computer
Engineering, University of Seoul, Seoul, Republic of Korea,
Springer International Publishing Ag 2016.
[13] Shahnawaz, R. B. Mishra, “A Neural Network Based
Approach for English To Hindi Machine Translation”,
International Journal of Computer Applications Volume 53,
September 2012.
[14] Shijie Liu, Nan Yang, Mu Li And Ming Zhou, “A Recursive
Recurrent Neural Network for Statistical Machine
Translation”, Microsoft Research Asia, Sbeijing, China and
University of Science and Technology of China, Hefei,
China, Baltimore, Maryland, Usa, June 23-25-2014.
[15] Soheil Baharampour, Naven Ramkirishan, Lukas Schott,
Mohak Shah, “Comparative Study Of Deep Learning
Software Frameworks”, Research and Technology Center,
Robert Bosch Llc.
[16] Wei He, Zhongjun He, Huawu, And Haifengwang,
“Improved Neural Machine Translation With Smt Features”,
Proceedings of The Thirtieth Aaai Conference On Artificial
Intelligence (Aaai-16) Volume 3, 30 March 2016.
[17] Younghui Wu, Mike Schuster, Zhifeng Chen, “Google’s
Neural Machine Translation System: Bridging the Gap
Between Human and Machine Translation”, Volume 2, 8
October 2016.
[18] http://www.deeplearningbook.org/.
[19] http://haifux.org/lectures/267/introduction-to-gpus.pdf
[20] https://arxiv.org/pdf/1411.2738.pdf
[21] http://nlp.cs.tamu.edu/resources/wordvectors.ppt
[22] http://www.minerazzi.com/tutorials/cosine-similarity-
tutorial.pdf
... Deep learning (DL) technologies have significantly advanced many fields critical to mobile applications, such as image understanding, speech recognition, and text translation [29,43,49]. Besides, a lot of research efforts have been put into optimizations of DL latency and efficiency [11,22,34,40], paving the path towards the local intelligent inference on mobile devices like smartphones. ...
Preprint
Intelligent Apps (iApps), equipped with in-App deep learning (DL) models, are emerging to offer stable DL inference services. However, App marketplaces have trouble auto testing iApps because the in-App model is black-box and couples with ordinary codes. In this work, we propose an automated tool, ASTM, which can enable large-scale testing of in-App models. ASTM takes as input an iApps, and the outputs can replace the in-App model as the test object. ASTM proposes two reconstruction techniques to translate the in-App model to a backpropagation-enabled version and reconstruct the IO processing code for DL inference. With the ASTM's help, we perform a large-scale study on the robustness of 100 unique commercial in-App models and find that 56\% of in-App models are vulnerable to robustness issues in our context. ASTM also detects physical attacks against three representative iApps that may cause economic losses and security issues.
... * Corresponding author. Machine learning algorithms have brought revolutionary changes in a wide variety of fields, such as computer vision [5], machine translation [6], and computational biology [7]. Machine learning algorithms build models based on a large amount of sample data, known as training data. ...
Preprint
Full-text available
Electronic Health Records (EHRs) exhibit a high amount of missing data due to variations of patient conditions and treatment needs. Imputation of missing values has been considered an effective approach to deal with this challenge. Existing work separates imputation method and prediction model as two independent parts of an EHR-based machine learning system. We propose an integrated end-to-end approach by utilizing a Compound Density Network (CDNet) that allows the imputation method and prediction model to be tuned together within a single framework. CDNet consists of a Gated recurrent unit (GRU), a Mixture Density Network (MDN), and a Regularized Attention Network (RAN). The GRU is used as a latent variable model to model EHR data. The MDN is designed to sample latent variables generated by GRU. The RAN serves as a regularizer for less reliable imputed values. The architecture of CDNet enables GRU and MDN to iteratively leverage the output of each other to impute missing values, leading to a more accurate and robust prediction. We validate CDNet on the mortality prediction task on the MIMIC-III dataset. Our model outperforms state-of-the-art models by significant margins. We also empirically show that regularizing imputed values is a key factor for superior prediction performance. Analysis of prediction uncertainty shows that our model can capture both aleatoric and epistemic uncertainties, which offers model users a better understanding of the model results.
... However, in the recent two decades, the use of ANN increased. Now ANN is used to solve a lot of problems such as clustering [28], classification [29], regression [30], machine translation [31] and more. Many DL techniques have been applied to POS tagging for Indo-European languages. ...
Chapter
The use of Deep Learning (DL) in Natural Language Processing (NLP) has seen a significant growth in the last few years. Part of Speech (POS) tagging is an important element for many NLP applications, including machine translation, sentiment analysis, and text summarization. It consists in identifying the very likely tag (noun, verb or particle, adverbs, adjectives etc.) for each word in a given text. The goal of this research is to conduct a systematic literature review of deep learning methods applied to Arabic POS tagging for the last two decades. The review was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. More than 4000 papers were reviewed to extract all DL approaches used to develop POS taggers for the Arabic language. After multiple exclusion steps 12 articles were selected for a full review. Results show that Long Short-Term Memory (LSTM) and its extension Bidirectional LSTM (Bi-LSTM) models are the most used DL techniques for Arabic POS tagging, and they give better results according to the reviewed papers.
... Such advancements, made widely accessible due to their now relatively low price in terms both of energy and components, have made possible the widespread emergence of deep learning (DL), which rests on the training of neural networks with the help of large (>10GB) datasets. Thus the uses and experiments based on neural networks have exploded in the past few years, with a wide array of applications anywhere from self-driving cars (Ettinger et al., 2021;Zhao et al., 2020;Rhinehart et al., 2019) to singing portraits (Zakharov et al., 2019;Vougioukas et al., 2019), in fields as various as art (Yalçın et al., 2020;Mordvintsev et al., 2015;Bethge et al., 2016;Foster, 2019), media and entertainment (Skinner and Walmsley, 2019;Covington et al., 2016;Amato et al., 2019), and of course science at large (Baldi et al., 2015;Wang et al., 2019;Carleo et al., 2019;Salman et al., 2015).Neural networks can be applied to a large variety of data types, most popularly to still images, but also to sounds (Deng et al., 2013;Li et al., 2017), videos (Zhang et al., 2016;Lotter et al., 2016;Mathieu et al., 2015), text (Iqbal and Qureshi, 2020;Yousefi-Azar and Hamey, 2017;Kowsari et al., 2017), and even symbolic equations (Cranmer et al., 2020;Lample and Charton, 2019) They excel in the fast performance of automated tasks, which can take many forms, such as prediction (Lv et al., 2014;Poplin et al., 2018;Chong et al., 2017;Qiu et al., 2018), detection (Zhao et al., 2019;Chalapathy and Chawla, 2019;Liu et al., 2020;Badjatiya et al., 2017), clustering and data visualisation (Aljalbout et al., 2018;Min et al., 2018;Tian et al., 2014), reconstruction (Rivenson et al., 2018;Hyun et al., 2018), generation (Goodfellow et al., 2014;Briot and Pachet, 2017;He and Deng, 2017), translation (Singh et al., 2017;Popel et al., 2020;Varela-Salinas et al., 2018) and much more. ...
Thesis
Full-text available
The standard cosmological model provides a description of the Universe as a whole: its content, its evolution and its dynamics. A standard way of determining the evolution of matter in the Universe rests on the use of numerical simulations that are very expensive in terms of running time, storage and computing power. We explore the use of deep neural networks (DNN) as an alternative to these costly simulations. In a first part, we built and trained a Generative Adversarial Network (GAN) to extract the underlying distribution of a dataset, built from a simulated dark matter density field, and to quickly generate new simulation-like data with identical statistics. We have determined, in details, the strengths and limitations of use of GANs for this purpose, and found that the GAN successfully generates new images and data cubes that are statistically consistent with the data on which it was trained. In a second part, we have shown how to make use of the trained GAN to construct a simple replicative autoencoder (AE) that can conserve the statistical properties of the data, and further developed a predictive AE to infer data at redshift z=0 (today) from earlier epochs (z=1,2,3). We found that the replicative AE can efficiently extract information from simulation data to encode it into a reduced number of parameters. By this means, the AE can recover the images and cubes satisfactorily, notably conserving their statistical properties in terms of density distribution, power spectrum and peak counts. Finally, we show that the predictive AE, while showing poor predictive capacity in its simplest form, succeeds very well to infer data evolution once we supply it with sufficient information in input, notably when using the associated velocity field. With these proofs of concept, we conclude that DNNs are promising tools to quickly generate realistic large datasets. Moreover, when trained and supplied with the right information (e.g. dynamics of the density field) DNNs contain the necessary information to describe the structure evolution.
Article
Full-text available
Named entity recognition is a technique for extracting named entities from text and classifying them into various entity types. There has been a lot of research done on the Punjabi language’s Shahmukhi script, with less emphasis on the Gurmukhi script. This paper proposes a novel technique for extracting named entities from sentences written in the Punjabi language’s Gurmukhi script, which categorizes the entities into six different entity types. 15 k sentences from the Indic data corpus’ Punjabi data and various newspapers were used for this work, and they were annotated with Doccano, an open-source annotation tool. In addition, the researchers proposed and made public an annotated benchmark corpus for Gurmukhi script. The model was trained on the Spacy framework with only 12 k sentences selected at random from the Punjabi data corpus, and the results were validated with the remaining 3 k sentences in terms of F1-score, which was chosen as the evaluation metric. The experimental results have been analyzed, and the article contains useful information about the technique.
Article
The application of electroencephalogram (EEG)-based emotion recognition (ER) to the brain–computer interface (BCI) has become increasingly popular over the past decade. Emotion recognition systems involve pre-processing and feature extraction, followed by classification. Deep learning has recently been used to classify emotions in BCI systems, and the results have been improved when compared to classic classification approaches. The main objective of this study is to classify the emotions from electroencephalogram signals using variant recurrent neural network architectures. Three architectures are used in this work for the recognition of emotions using EEG signals: RNN (recurrent neural network), LSTM (long short-term memory network), and GRU (gated recurrent unit). The efficiency of these networks, in terms of performance measures was confirmed by experimental data. The experiment was conducted by using the EEG Brain Wave Dataset: Feeling Emotions, and achieved an average accuracy of 95% for RNN, 97% for LSTM, and 96% for GRU for emotion detection problems.
Article
Until recently, the supply chain sector, which had been getting by with scattered spreadsheets, phone conversations, and even paper-based records until recently, was exposed for its antiquated methods during the epidemic. As a result, businesses have undergone a decade of digital change in only a few months, with the epidemic driving them to replace antiquated procedures with AI, machine learning, and data science technology. The supply chain sector has reached a point in its AI adoption where the technology is solid and powerful enough to improve decision-making significantly. For example, predictive analytics (e.g., time series forecasting) is already a proven benefit. Such technology is smart enough to recognise irregularities and learn how a stock market will move in real-time. With the advancement of digital innovation, researchers have focused on deep learning (DL) models to get a more accurate and unbiased estimation. Consequently, this paper presents a novel DL approach for time series prediction using a combination of poly-linear regression with Long Short-Term Memory (LSTM) and data augmentation. It is consequently named Poly-linear Regression with Augmented Long Short Term Memory Neural Network (PLR-ALSTM-NN). The proposed DL model can be exploited to predict the future financial markets more accurately than existing state-of-the-art neural networks and machine learning tools. In order to make the model a more generic one, it is first validated on four financial market time-series datasets and then also implemented on a supply chain time-series dataset to predict sales data. LSTM, with its feedback connections, can process an entire series of data as well as single data points and statistical regression establishes the strength and character of the relationship between some dependent and independent variables. After doing experimental validations and based on the long-term and short-term predicted data, the suitability of the proposed PLR-ALSTM-NN is well-grounded against a few recent and advanced state-of-the-art machine learning, and DL approaches.
Article
Introduction: Artificial intelligence based on machine learning has made large advancements in many fields of science and medicine but its impact on pharmacovigilance is yet unclear. Objective: The present study conducted a scoping review of the use of artificial intelligence based on machine learning to understand how it is used for pharmacovigilance tasks, characterize differences with other fields, and identify opportunities to improve pharmacovigilance through the use of machine learning. Design: The PubMed, Embase, Web of Science, and IEEE Xplore databases were searched to identify articles pertaining to the use of machine learning in pharmacovigilance published from the year 2000 to September 2021. After manual screening of 7744 abstracts, a total of 393 papers met the inclusion criteria for further analysis. Extraction of key data on study design, data sources, sample size, and machine learning methodology was performed. Studies with the characteristics of good machine learning practice were defined and manual review focused on identifying studies that fulfilled these criteria and results that showed promise. Results: The majority of studies (53%) were focused on detecting safety signals using traditional statistical methods. Of the studies that used more recent machine learning methods, 61% used off-the-shelf techniques with minor modifications. Temporal analysis revealed that newer methods such as deep learning have shown increased use in recent years. We found only 42 studies (10%) that reflect current best practices and trends in machine learning. In the subset of 154 papers that focused on data intake and ingestion, 30 (19%) were found to incorporate the same best practices. Conclusion: Advances from artificial intelligence have yet to fully penetrate pharmacovigilance, although recent studies show signs that this may be changing.
Article
Trucks are the key transporters of freight. The types of commodities and goods mainly determine the right trailer for carrying them. Furthermore, finding the commodities’ flow is an important task for transportation agencies in better planning freight infrastructure investments and initiating near-term traffic throughput improvements. In this paper, we propose a fine-grained deep learning based truck classification system that can detect and classify the trucks, tractors, and trailers following the Federal Highway Administration’s (FHWA) vehicle schema. We created a large, fine-grained labeled dataset of vehicle images collected from state highways. Experimental results show the high accuracy of our system and visualize the salient features of the trucks that influence classification.
Conference Paper
Full-text available
Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-ofthe-art NMT system on Chinese-to-English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.
Conference Paper
Full-text available
We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.
Article
Full-text available
We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.
Article
Full-text available
Since the first online demonstration of Neural Machine Translation (NMT) by LISA, NMT development has recently moved from laboratory to production systems as demonstrated by several entities announcing roll-out of NMT engines to replace their existing technologies. NMT systems have a large number of training configurations and the training process of such systems is usually very long, often a few weeks, so role of experimentation is critical and important to share. In this work, we present our approach to production-ready systems simultaneously with release of online demonstrators covering a large variety of languages (12 languages, for 32 language pairs). We explore different practical choices: an efficient and evolutive open-source framework; data preparation; network architecture; additional implemented features; tuning for production; etc. We discuss about evaluation methodology, present our first findings and we finally outline further work. Our ultimate goal is to share our expertise to build competitive production systems for "generic" translation. We aim at contributing to set up a collaborative framework to speed-up adoption of the technology, foster further research efforts and enable the delivery and adoption to/by industry of use-case specific engines integrated in real production workflows. Mastering of the technology would allow us to build translation engines suited for particular needs, outperforming current simplest/uniform systems.
Article
Full-text available
Deep neural networks (DNNs) are widely used in machine translation (MT). This article gives an overview of DNN applications in various aspects of MT.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Conference Paper
Word2vec is a widely used word embedding toolkit which generates word vectors by training input corpus. Since word vector can represent an exponential number of word cluster and enables reasoning of words with simple algebraic operations, it has become a widely used representation for the subsequent NLP tasks. In this paper, we present an efficient parallelization of word2vec using GPUs that preserves the accuracy. With two K20 GPUs, the proposed acceleration technique achieves 1.7M words/sec, which corresponds to about 20× of speedup compared to a single-threaded CPU execution.
Article
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
Article
The demand for language translation has greatly increased in recent times due to increasing cross-regional communication and the need for information exchange. Most material needs to be translated, including scientific and technical documentation, instruction manuals, legal documents, textbooks, publicity leaflets, newspaper reports etc. Some of this work is challenging and difficult but mostly it is tedious and repetitive and requires consistency and accuracy. It is becoming difficult for professional translators to meet the increasing demands of translation. In such a situation the machine translation can be used as a substitute. This paper offers a brief but condensed overview of Machine Translation (MT). Through the following points: History of MT, Architectures of MT, Types of MT, and evaluation of M T.