Science topic
Machine Translation - Science topic
Explore the latest questions and answers in Machine Translation, and find Machine Translation experts.
Questions related to Machine Translation
Your students probably use either DeepL or Google translate or both, whether they are allowed to, or not, at least outside the classroom in order to prepare translations or writing assignments.
At the university of Lille, France, we have decided that since we cannot forbid it, we should teach students how to use Machine Translation and its limits. We are doing various teaching experimentations you can read about at the following URL
We had a day debating this issue and contributors explained various strategies used at different levels, from high schools to university, both for language specialists and specialists of other disciplines. "Comment enseigner (avec) la traduction automatique ?"
You will find more about this topic at https://tq2022.sciencesconf.org
How do you deal with this ? What are you strategies ?
Looking forward to sharing with you.
In pragmatics studies meaning interpretation depends on the contextual features (language in use) of the text rather than its linguistic components. In this respect, Machine Translation appears as a questioable issue in terms of its realization of such pragmatic features, such as in speech acts. My question is that "can Machine Translation make accurate translations of such types of texts, i.e. pragmatics-based-texts?"
Hello dears,
Recently Viber added a new feature on their application to translate the messages by using Azure machine translation, but they have not added Kurdish language in the list. I would like work on this project with a professional team.
Can you recommend other suggestions for research topics in post-editing machine translation field? I'm interested in this field
Journal for Survey paper in the field of Machine Translation or NLP.
Hello!
currently I am trying to find datasets for document level translation in which not just sentence to sentence level translation datasets.
Any suggestions?
Prof. Emerita Sue Ellen Wright from Kent State University has posted on LinkedIn a message in which she reports the death of Prof. Juan Carlos Sager.
What a sad coincidence it is that I am currently lecturing a course on Terminology at the University of Antioquia in Colombia and some excerpts of his well-known "Practical Course in Terminology Processing" were part of the discussion of the last session.
I wish I could have had the opportunity to meet him in person.
My current Deep Learning model has 6M parameters (which is low according to me) but still, my model is showing the behavior of getting overfit (the training accuracy is continuously improving but the validation accuracy is not improving after 31% ). I have tried dropout and regularization to overcome the overfitting but it lowers the increment rate in training accuracy but still, the validation accuracy is stuck to 31%. The model is basically for text generation. The following are my doubts:
- The separation of my data is 65% for training, 5% for validation, and 30% for testing. Is this separation is valid and can it affect my training process?
- As compared to other deep learning models (especially in NLP) which have on average 50M+ parameters my model has behavior fewer parameters. What could be the possible reasons for getting low validation accuracy?
- My concern is that if my features are not sufficient then why the training accuracy is improving?
- How we differentiate the behavior of our model whether it is overfitting or underfitting in Machine translation where getting a high validation score do not ensure that your output translation is good?
Note: The model is consists of two recurrent layers of LSTM and some dense layers.
I rewrote the RNN LSTM seq2seq reg code in python. I hope it gets work with different seq length on train without fix padding of all sequence i.e. I want data stream of custom standardization to model fit.
I chose 5 cat each 20 seq to train network with equal size of time steps within a batch but different length in between batches with similar to MATLAB (https://www.mathworks.com/help/deeplearning/examples/sequence-to-sequence-regression-using-deep-learning.html
in which,
1- packing of different data length of different length in each batch to fit model in training,
2- using that data how smoothly adding 1 problem in training with different padding in every batch. I use Keras with Tenserflow backend In python.
I will post the code if needed.
The quality of AI-based translation systems grows very quickly (DeepL Translator, Google Translate, Bing Microsoft Translator, Amazon Translate, etc.).
- What is your experience in teaching translation in this new context?
- How do IA-based translation systems change today the profession of translator in your field of expertise? threat or opportunity?
- Do you know any translation research recently published on this topic?
I am working on a project that aims at testing the viability of training a NMT on a language specific corpora. Any recommendations/suggestions? (Language pair: Arabic/English)
For transformer-based neural machine translation (NMT), take English-Chinese for example, we pass English for encoder and use decoder input(Chinese) attend to encoder output, then final output.
What if we do not pass input for decoder and consider it as a 'memory' model for translation. Is it possible and what will happen?
It seems decoder could be removed and there only exist encoder.
Could I do translation task like text generation?
See:
https://github.com/salesforce/ctrl/blob/master/generation.py
https://einstein.ai/presentations/ctrl.pdf
Technological inventions witness rapid progress that has noticeably affected man's life in many sectors including teaching languages and translation. Do you think that machine can play the role of the man as far as translation and interpretation are concerned?
I'm new in deep learning. To train a model using only CPUs is very slow. Is there any way to work with GPUs without changing codes/scripts?
I'm using anaconda for python programming
Model to train on my data is google seq to seq for machine translation.
My system has NVIDIA Corporation GM204 [GeForce GTX 970]
And i am using Ubuntu operating system.
Any suggestion and solutions regarding will be appreciated.
Thank you.
I have a corpus of documents in English language. Each document is labelled sentence-wise with labels associated with the domain
I have another corpus with the same documents in another language.
I want to label the non-English corpus in an unsupervised fashion according to the labels of the English corpus.
There is the possibility that one sentence in English may correspond to multiple sentences in another language or vice-versa.
In this case, all the sentences that are the translation of a single original sentence will have the same label of the original sentence.
What would it be the best approach? Which are relevant work with similar setting that I should study?
I am working on topic modeling for a small project in my PhD. I need to translate quite a bit a documents in different language to the same language (English). I found a lot of papers mentioning machine translation as first step of their methodology but they never mention the tool or API. Is there a free API to use to translate entire documents? I am not interested in super high quality of translation at the moment.
Is there any research on machine translation from baby sound to English language?
I'd like to make a research on assessing accuracy of Google Translate using BLEU as parameter.
I am looking for Hindi-English Code-mixed parallel data for Machine Translation.
How punctuation marks can be reflected in the written forms of sign language. I need to emphasis on punctuation marks for sign language machine translation.
The performance of a machine translation system is highly impacted by the parallel corpus it is trained on. Therefore, as we all know, a good quality normalized and (ideally) noise-free corpus is essential. To know how a MT system is performing we have to actually train the model first and then test its performance on the test data set. This is sometimes time consuming due to the fact that the data sets are usually huge. For my thesis I have 5 different versions of my parallel corpus. If I precisely want to measure the performance using BLEU score or some other metrics I need to train the model 5 times and that will take a lot of my time. Therefore, I was wondering if there is any way to measure the quality of a parallel corpus beforehand?
I have a paper on machine translation and I need to submit it to an ISI journal. Could you suggest a journal that can review the paper in short time?
A typical IE/NLP pipeline involves sentence segmentation, tokenization, POS tagging, chunking, entity detection and relation extraction tasks. Compare and contrast the functionalities provided following frameworks/toolkit for implementing such a pipeline using the literature:
(a) NLTK, (b) Stanford CoreNLP, (c) Apache OpenNLP (d) SpaCy and (e) GATE
Consider one such task (e.g. POS tagging, chunking or Named entity recognition) and evaluate the performance of two of the above tool kits on a relevant CoNLL data set.
(CoNLL--‐2000 for chunking, CoNLL--‐2002 for NER)
How can I train my data dictionary with Training NMT Models?
If English, can you say what dataset is used as benchmark
Looking for Neural Machine Translation Tool
WSD system still not used in most applications that need disambiguation, (for example machine translation (MT), its an isolated module , a lot of researcher develop this approaches, which is not an end in itself , but this approaches , still not applied in the dedicated applications, so why?
KNN in contrary to other algorithms doesn’t build any model, do you think this is a privilege because it doesn’t depend on any training data so when we enrich the data its performance increases?
i am working on decoding sanskrit to hindi or hindi to english at a time but there is linguistic problem therefore i have no any same dataset on sanskrit to hindi or hindi to english then please provided me dataset resource.
As I'm new to the topic, I'm looking for information on benchmark corpora that can be obtained (not necessary free) for audio events classification or computational auditory scene analysis.
I'm especially interested in house/street sounds.
I am searching for parser using lexical structure .
I want to know how a task of paradigm identification can be of any use to tasks like MT or any other task in Natural language processing
I have multiple corpora from multiple sources and are in different formats. Is there any way to use all these in Stanford NER?
I am working on a project where I need to calculate the perplexity or cross-entropy of some text data. I have been using MITLM, but it does not seem to be very well documented. Can anyone suggest an alternative of MITLM? Thanks!
I am trying to develop a rule based machine translation system using prolog so I need prolog materials please help me.
Hi,
I need to write a tool that would by given keywords search in database of articles and recommend to users the most likely article containing proper information. I was thinking to use following search heyristics:
1. if keywords are in text in close proximity (near each other) it is more probable that article is on topic
2. if I can find article on some topic lets say from 2008 and on the same topic from 2012 (so newer) with many negations in text I could assume that old research was wrong and I should proritise newer article
3. I should allow queries in which user could define if he is looking exactly for some amount of keyword in one text or only one or more of them. Or for example that some keywords must be found and some does not
Are my assuptions correct ? Do you have any better ideas to return more accurate results ?
I'm working in Arabic machine translation and I want to analyse the sentence in order to translate it to the target language. I used MADAMIRA to perform a morphological analysis and I want to perform syntactic analysis, which tools should I use in this stage?
Are there any open source tools for language detection and/or translation when a single sentence contains multiple languages?
I am working on context-based machine translation for English to Marathi
I want to do translation project about
- translation text between English to Hindi
- machine translation
- study about contrastive analysis from translated text etc.
I am not Hindi and English native speaker (I am Thai) but now i am studying translation course about Hindi and English in India. And I am Thinking about project.
It is observed that the Stanford parser does wrong POS tagging for some lexical categories.
I have considered 3 datasets and 4 classifiers & used the Weka Experimenter for running all the classifiers on the 3 datasets in one go.
When I Analyze the results, considering say classifier (1) as the base classifier, the results that I see are :
Dataset (1) functions.Linea | (2)functions.SM (3) meta.Additiv (4) meta.Additiv
--------------------------------------------------------------------------------------------------
'err_all' (100) 65.53(9.84) | 66.14(9.63) 65.53(9.84) * 66.14(9.63)
'err_less' (100) 55.24(12.54) | 62.07(18.12) v 55.24(12.54) v 62.08(18.11) v
'err_more' (100) 73.17(20.13) | 76.47(16.01) 73.17(20.13) * 76.47(16.02)
--------------------------------------------------------------------------------------------------
(v/ /*) | (1/2/0) (1/0/2) (1/2/0)
As far as I know:
v - indicates that the result is significantly more/better than base classifier
* - indicates that the result is significantly less/worse than base classifier
Running multiple classifiers on single database is easy to interpret, but now for multiple datasets, I am not able to interpret which is better or worse as the values indicated do not seem to match the interpretation.
Can someone pls. help interpret the above result as I wish to find which classifier performs the best & for which dataset.
Also what does (100) next to each dataset indicate?
'err_all' (100), 'err_less' (100), 'err_more' (100)
Does anyone have any translation memory or even pairs of translated sentences, which can be prepared as Translation Memory.
Hello to all. I can't find any contributions to machine translation using pregroup grammar or Lambek calculus, on the net. I am working on this and wanted to know if there is any literature.
Hi All,
I have implemented phrase-based model in MOSES, now I wanted to implement "String-to-Tree" or "Tree-to-String" model (because one of my language is under resource language and therefore I wanted to add linguistic to one side of the translation i.e. source language will have linguistic rules or the target language, but not the both.
I wanted to know is there any research paper or any tutorial which show how to implement these models in MOSES.
Thanks,
Asad
Machine translation is pure statistical or some cognitive algorithms are being explored in these systems? If yes then what are those algorithms and approaches?
It will be appreciated if I could have examples with code, tutorial or any other useful resource.
I am trying to use Stanford TokensRegex to design patterns. I am attempting to catch "A manager may manage at most 2 branches" where it has been mentioned once in the text, however I failed to get it. below is my code
String file="A store has many branches. Each branch must be managed by at most 1 manager. A manager may manage at most 2 branches. The branch sells many products. Product is sold by many branches. Branch employs many workers. The labour may process at most 10 sales. It can involve many products. Each Product includes product_code, product_name, size, unit_cost and shelf_no. A branch is uniquely identified by branch_number. Branch has name, address and phone_number. Sale includes sale_number, date, time and total_amount. Each labour has name, address and telephone. Worker is identified by id’.";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
// create an empty Annotation just with the given text
Annotation document = new Annotation(file);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences)
{
TokenSequencePattern pattern = TokenSequencePattern.compile("A manager may manage at most 2 branches");
String sentence1=sentence.toString();
String[] tokens = sentence1.split(" ");
TokenSequenceMatcher matcher = pattern.getMatcher(document.get (CoreAnnotations.SentencesAnnotation.class));
while( matcher.find()){
JOptionPane.showMessageDialog(rootPane, "It has been found");
}
}
Please suggest any books, articles which could help me in learning to design patterns in Stanford TokensRegex within Stanford CoreNLP.
In my experience MaxEnt is always better than SVM for natural language processing task, like text classification, machine translation, named entity extraction. I've tried to train MaxEnt with different parameters and I find that SVM outperforms always MaxEnt.
Formal evaluations of machine translation (MT) systems give a user a general sense of translation accuracies. For users with a passing knowledge of a target language, it is hard to establish reliability of translation results from their native source language. Have you seen any MT systems addressing the issue of confidence? Perhaps some interactive interface features? One of my students asked me: "How do I know when to trust MT? How do I evaluate the accuracy of the results?" Neither of the extreme answers seem satisfactory. Nor are they practical. You could say: "Well, you can't trust MT". Or you could suggest: "Consult a native speaker, or learn the target language". Is there an in-between answer that's feasible? Or, is the MT field simply "not there" yet in terms of making Watson-like goodness-of-fit judgements?
Machine translation is one of the oldest subfields of artificial intelligence research. However, real progress was much slower. The use of statistical machine translation systems has led to significant improvements in the translation quality. Nevertheless, systems utilizing both statistical methods and deep linguistic analyses are better.
So, there are any courses online about these techniques and how to use them?
Modern Machine Translation Systems do not have a mechanism for interpretation of the input messages on models of concepts systems.
Any electronic resources include books, example, tutorial are appreciated.
Say, you want to know what the Russian and Ukrainian online news are saying about the recent Kiev events and the EU Summit? Or, say, you'd like to see first hand what the Chinese or Malaysian authorities are reporting on the missing flight MH370?
What would you do (assuming you don't know those languages)? What would be your strategies? Have you ever been able to solve an information problem like that across languages with any particular app or technology?
I am developing a Part-of-Speech (POS) Tagger system for isiXhosa one of South African official language. The system needs to be intelligent enough such that it identifies POS of each word (e.g. noun, verb,...) based on the context that the word is in. The problem with POS tagging is the issue of ambiguity. One word can have more than one POS.
Example: The word book;
1. I want to book a 07:30 flight to Cape Town.
2. Xolani can you please pass me that book.
The word book from the above two sentences have different part of speech tags (i.e. verb and noun respectively). The same problem occurs for isiXhosa and it is what I am solving.
Now I want to use Maximum Entropy Markov Model (MEMM) to solve this problem. My problem is the implementation of MEMM in PHP, can someone refer me to a link that will help solve this problem.
Thank you in advance
I am working on context-base English to Marathi Machine Translation. I wanted to disambiguate nouns using Rule-Based MT approach for English to Marathi MT.
When we transcribe a spoken corpus, the corpus we obtain can we describe it as written?
Would anybody who has tried it please comment. Including which MT program has been used.
I wanted to develop a machine translation system, using anyone of these three techniques RBMT, SMT and EBMT. But I don't know how and from where should I start development. I've used MOSES SMT but was unable to perform any of the tasks.
Kindly suggest to me which technique I should use in order to develop system with little resource and less time.
PS. Source and Target language are English and Urdu vise verse.
At word level, how can I state that this word is the best translation of an other word?
What is your opinion about the advantages and disadvantages of online machine translation?
Especially when Arabic dialects are the languages spoken only.