Science topic

Machine Translation - Science topic

Explore the latest questions and answers in Machine Translation, and find Machine Translation experts.
Questions related to Machine Translation
  • asked a question related to Machine Translation
Question
5 answers
Your students probably use either DeepL or Google translate or both, whether they are allowed to, or not, at least outside the classroom in order to prepare translations or writing assignments.
At the university of Lille, France, we have decided that since we cannot forbid it, we should teach students how to use Machine Translation and its limits. We are doing various teaching experimentations you can read about at the following URL
We had a day debating this issue and contributors explained various strategies used at different levels, from high schools to university, both for language specialists and specialists of other disciplines. "Comment enseigner (avec) la traduction automatique ?"
You will find more about this topic at https://tq2022.sciencesconf.org
How do you deal with this ? What are you strategies ?
Looking forward to sharing with you.
Relevant answer
Answer
I think that teaching foreign languages translation in the era of neural machine translation requires training teachers on modern trends that have entered the field of teaching foreign languages, where we do not and we lack development in this field
  • asked a question related to Machine Translation
Question
12 answers
In pragmatics studies meaning interpretation depends on the contextual features (language in use) of the text rather than its linguistic components. In this respect, Machine Translation appears as a questioable issue in terms of its realization of such pragmatic features, such as in speech acts. My question is that "can Machine Translation make accurate translations of such types of texts, i.e. pragmatics-based-texts?"
Relevant answer
Answer
Machine translation (MT) has undergone quite interesting developments. However, it still has some linguistic, pragmatic, and affective issues, which require human post-editing to shun any occurrence of “lost in translation” or misinterpretation that could lead to harm. For instance, the study by Taira et al. (2021) concluded that Google Translate is not fully ready for use in clinical settings because of inaccuracies that could result in harm to patients.
Nonetheless, neural machine translation (NMT) seems promising despite being expensive and having issues in translating vague expressions. NMT is powered by artificial intelligence, i.e., improvements are constantly made to ameliorate the quality of NMT output. Here is the full citation to the article by Taira et al. (2021).
Taira, B. R., Kreger, V., Orue, A., & Diamond, L. C. (2021). A pragmatic assessment of Google Translate for emergency department instructions. Journal of General Internal Medicine. https://doi.org/10.1007/s11606-021-06666-z
Good luck,
  • asked a question related to Machine Translation
Question
2 answers
Hello dears,
Recently Viber added a new feature on their application to translate the messages by using Azure machine translation, but they have not added Kurdish language in the list. I would like work on this project with a professional team.
Relevant answer
Answer
Good project
  • asked a question related to Machine Translation
Question
1 answer
Can you recommend other suggestions for research topics in post-editing machine translation field? I'm interested in this field
Relevant answer
Answer
The profession of a technical translator in the future will involve a combination of the following skills: a linguist, as well as a kind of IT analyst, constantly looking for flaws in the machine translation system in order to improve it, and a machine translation specialist, bringing to the attention of the customer the found flaws. The translators of the future will not sell words that the machine already generates faster and in a larger volume, but the authenticity of these words, the certification of the essence of a document, the confirmation that the meaning of a document is transmitted correctly, that is, the responsibility that machine translation will not be able to assume under any circumstances. Just the ability to translate from scratch, for example, a standard contract is paid less and less every day, but those who are ready to master the possibilities offered by automation systems, and become at once operators of such systems and a kind of notaries-certifiers of the results issued by such systems, will definitely not remain unclaimed.
The post-editor should be aware of the problematic elements that may be present in the translated text, i.e. he should be familiar with the features of different types of machine translation, in terms of their inherent different types of errors. Understanding the" behavior " of the machine translation system directly affects the speed of the post-editor, when the latter knows in advance what errors to expect.
  • asked a question related to Machine Translation
Question
8 answers
Journal for Survey paper in the field of Machine Translation or NLP.
Relevant answer
Answer
  • Discrete Mathematics and Theoretical Computer Science.
  • Journal of ICT Research and Applications.
  • Journal of the Brazilian Computer Society.
  • Interdisciplinary Journal of Information, Knowledge, and Management.
  • Journal of Computing and Information Technology.
  • International Journal of Computer Science in Sport.
  • asked a question related to Machine Translation
Question
6 answers
Hello!
currently I am trying to find datasets for document level translation in which not just sentence to sentence level translation datasets.
Any suggestions?
Relevant answer
Answer
I use this tool https://www.systransoft.com/ and I start trying https://www.deepl.com/translator that recommended Wolfgang R. and works great, check this out and compare. Translation by computer still not precise 100%
  • asked a question related to Machine Translation
Question
3 answers
Prof. Emerita Sue Ellen Wright from Kent State University has posted on LinkedIn a message in which she reports the death of Prof. Juan Carlos Sager.
What a sad coincidence it is that I am currently lecturing a course on Terminology at the University of Antioquia in Colombia and some excerpts of his well-known "Practical Course in Terminology Processing" were part of the discussion of the last session.
I wish I could have had the opportunity to meet him in person.
Relevant answer
Answer
Sincere condolences to his family members, friends, and colleagues.
May his soul rest in peace.
  • asked a question related to Machine Translation
Question
20 answers
My current Deep Learning model has 6M parameters (which is low according to me) but still, my model is showing the behavior of getting overfit (the training accuracy is continuously improving but the validation accuracy is not improving after 31% ). I have tried dropout and regularization to overcome the overfitting but it lowers the increment rate in training accuracy but still, the validation accuracy is stuck to 31%. The model is basically for text generation. The following are my doubts:
  1. The separation of my data is 65% for training, 5% for validation, and 30% for testing. Is this separation is valid and can it affect my training process?
  2. As compared to other deep learning models (especially in NLP) which have on average 50M+ parameters my model has behavior fewer parameters. What could be the possible reasons for getting low validation accuracy?
  3. My concern is that if my features are not sufficient then why the training accuracy is improving?
  4. How we differentiate the behavior of our model whether it is overfitting or underfitting in Machine translation where getting a high validation score do not ensure that your output translation is good?
Note: The model is consists of two recurrent layers of LSTM and some dense layers.
Relevant answer
Answer
  • asked a question related to Machine Translation
Question
4 answers
I rewrote the RNN LSTM seq2seq reg code in python. I hope it gets work with different seq length on train without fix padding of all sequence i.e. I want data stream of custom standardization to model fit.
I chose 5 cat each 20 seq to train network with equal size of time steps within a batch but different length in between batches with similar to MATLAB (https://www.mathworks.com/help/deeplearning/examples/sequence-to-sequence-regression-using-deep-learning.html
in which,
1- packing of different data length of different length in each batch to fit model in training,
2- using that data how smoothly adding 1 problem in training with different padding in every batch. I use Keras with Tenserflow backend In python.
I will post the code if needed.
Relevant answer
Hi,
Sorry for my late update. I used LSTM that is said if it is trained on my data the model cannot be used on other domains. Otherwise, I have link in the problem description. Have you checked the link?
  • asked a question related to Machine Translation
Question
3 answers
The quality of AI-based translation systems grows very quickly (DeepL Translator, Google Translate, Bing Microsoft Translator, Amazon Translate, etc.).
  • What is your experience in teaching translation in this new context?
  • How do IA-based translation systems change today the profession of translator in your field of expertise? threat or opportunity?
  • Do you know any translation research recently published on this topic?
Relevant answer
Answer
Dear Sir,
You asked a very relevant question. In a recent development, everything is changing by adopting AI techniques but now the time to adopt AI in Learning-Teaching to go one step ahead from traditional Teaching-Learning. For example collaborative learning(CL), personalized learning(PL) and Flip Learning (F-L) can be applied for Teaching Translation(or Transformation) by adopting AI-based learning tools.
  • asked a question related to Machine Translation
Question
5 answers
I am working on a project that aims at testing the viability of training a NMT on a language specific corpora. Any recommendations/suggestions? (Language pair: Arabic/English)
Relevant answer
Answer
How latent variable is defined for Non auto-regenerative model?
Any suggestion is appreciated.
  • asked a question related to Machine Translation
Question
1 answer
For transformer-based neural machine translation (NMT), take English-Chinese for example, we pass English for encoder and use decoder input(Chinese) attend to encoder output, then final output.
What if we do not pass input for decoder and consider it as a 'memory' model for translation. Is it possible and what will happen?
It seems decoder could be removed and there only exist encoder.
  • asked a question related to Machine Translation
Question
8 answers
Technological inventions witness rapid progress that has noticeably affected man's life in many sectors including teaching languages and translation. Do you think that machine can play the role of the man as far as translation and interpretation are concerned?
Relevant answer
Answer
I don't think so because of the polysemous words. One English word may have so many meanings. The machine is still unable to choose the right meaning in the right context.
  • asked a question related to Machine Translation
Question
4 answers
I'm new in deep learning. To train a model using only CPUs is very slow. Is there any way to work with GPUs without changing codes/scripts?
I'm using anaconda for python programming
Model to train on my data is google seq to seq for machine translation.
My system has NVIDIA Corporation GM204 [GeForce GTX 970]
And i am using Ubuntu operating system.
Any suggestion and solutions regarding will be appreciated.
Thank you.
Relevant answer
Answer
Which library do you use for the deep-learning?
tensorflow? or pytorch?
If you use tensorflow which is the one of the common libraries, just reinstall the tensorflow which is based on GPU.
(command: "pip install tensorflow-gpu")
Of course, there are few more install steps for using GPU.
Please, check the referred video: "https://www.youtube.com/watch?v=HExRhnO5Mqs"
If you correctly install the programs, you do not need to change your python source code for using GPU.
I hope that my answer help to your works.
  • asked a question related to Machine Translation
Question
3 answers
I have a corpus of documents in English language. Each document is labelled sentence-wise with labels associated with the domain
I have another corpus with the same documents in another language.
I want to label the non-English corpus in an unsupervised fashion according to the labels of the English corpus.
There is the possibility that one sentence in English may correspond to multiple sentences in another language or vice-versa.
In this case, all the sentences that are the translation of a single original sentence will have the same label of the original sentence.
What would it be the best approach? Which are relevant work with similar setting that I should study?
Relevant answer
Answer
Sorry, I have not been clear.
The translation were not done with the idea of creating a new sentence for each original sentence. For this reason, one sentence in English can correspond to multiple sentences in another language or vice-versa.
  • asked a question related to Machine Translation
Question
4 answers
I am working on topic modeling for a small project in my PhD. I need to translate quite a bit a documents in different language to the same language (English). I found a lot of papers mentioning machine translation as first step of their methodology but they never mention the tool or API. Is there a free API to use to translate entire documents? I am not interested in super high quality of translation at the moment.
Relevant answer
Answer
Agnieszka Will geb. Gronek
newspaper articles
  • asked a question related to Machine Translation
Question
5 answers
Is there any research on machine translation from baby sound to English language?
Relevant answer
I agree with Aida that sounds produced by infants are nonlinguistic ones. Therefore, they are meaningless. Please share us your reserach objective to be more helpful.
  • asked a question related to Machine Translation
Question
3 answers
I'd like to make a research on assessing accuracy of Google Translate using BLEU as parameter.
Relevant answer
Answer
You are not giving enough information. Tell us what language(s) you are translating from and what language(s) you are translating them into.
To some degree, it depends on the target language (the language you are translating into). BLEU has the structure of English and similar languages in its DNA. "Similar" means languages where word order (rather than word morphology) is central. In particular, BLEU is built around n-gram co-occurence statistics. For languages that are very different from English, METEOR may be more useful than BLEU. For non indo-european languages (Chinese, Arabic, etc. etc.), I would say that assessing translation quality is an open question and that BLEU will not necessarily be useful. One possiblility is to annotate the low-level concepts (nouns, verbs, adjectives, adverbs, quantifiers, and important prepositions) in the source language and then have bilingual judges tell you whether each such concept is present in the target-language translation.
Perhaps some of the papers that have me as an author will be helpful to you (in particular, look at the papers that they cite). You may also want to look into the MetricsMaTr evaluations.
  • asked a question related to Machine Translation
Question
4 answers
I am looking for Hindi-English Code-mixed parallel data for Machine Translation.
Relevant answer
Answer
Bhat, I. A., Bhat, R. A., Shrivastava, M., & Sharma, D. M. (2018). Universal Dependency Parsing for Hindi-English Code-switching. arXiv preprint arXiv:1804.05868.
  • asked a question related to Machine Translation
Question
1 answer
How punctuation marks can be reflected in the written forms of sign language. I need to emphasis on punctuation marks for sign language machine translation.
Relevant answer
Answer
A lot of this information (i.e., questions) is not contained within the manual signs, but rather in the facial expressions of the signer
  • asked a question related to Machine Translation
Question
3 answers
The performance of a machine translation system is highly impacted by the parallel corpus it is trained on. Therefore, as we all know, a good quality normalized and (ideally) noise-free corpus is essential. To know how a MT system is performing we have to actually train the model first and then test its performance on the test data set. This is sometimes time consuming due to the fact that the data sets are usually huge. For my thesis I have 5 different versions of my parallel corpus. If I precisely want to measure the performance using BLEU score or some other metrics I need to train the model 5 times and that will take a lot of my time. Therefore, I was wondering if there is any way to measure the quality of a parallel corpus beforehand?
Relevant answer
Answer
Hi Musfiqur Rahman,
You can use "Human judges" method and Kappa measure for judging the quality of a parallel corpus.
Dinh
  • asked a question related to Machine Translation
Question
2 answers
I have a paper on machine translation and I need to submit it to an ISI journal. Could you suggest a journal that can review the paper in short time? 
Relevant answer
Answer
there are many journals, but I prefer for you
ACM Journal on Computing and Cultural Heritage
OR
journal of deaf studies and deaf education.
  • asked a question related to Machine Translation
Question
2 answers
A typical IE/NLP pipeline involves sentence segmentation, tokenization, POS tagging, chunking, entity detection and relation extraction tasks. Compare and contrast the functionalities provided  following frameworks/toolkit for implementing such a pipeline using the literature:
(a) NLTK, (b) Stanford CoreNLP, (c) Apache OpenNLP (d) SpaCy and (e) GATE
Consider one such task (e.g. POS tagging, chunking or Named entity recognition) and evaluate the performance of two of the above tool kits on a relevant CoNLL data set.
(CoNLL--‐2000 for chunking, CoNLL--‐2002 for NER)
Relevant answer
Answer
I don't think a perfect article exists. I would start with the following paper and wait for further comparisons (future publications which cite it) or start searching for reports comparing particular features. That would be a nice paper if you survey them all!
  • asked a question related to Machine Translation
Question
1 answer
How can I train my data dictionary with Training NMT Models?
Relevant answer
Answer
Hi,
as you are dealing with sign language, i.e. image data, I wonder if we are not dealing with two challenges here:
  1. Feeding images, "visual input" into the software
  2. Training lamtram (or any other neural network) to recognize these images as a complex, pattern-based communication system
Afaik, lamtram does not accept images to this date, so maybe you should rather look into image classification frameworks.
P.S.: Just as I am writing this, I come to realize how this is making your project more complex (and more innovative, in a positive line), just as it is mire complex to analyze a Youtube video as opposed to a classic newspaper article.
Hope there are some experts who have already trained a computer to accommodate sign language.
Best regards,
Christiane
  • asked a question related to Machine Translation
Question
3 answers
If English, can you say what dataset is used as benchmark
Relevant answer
Answer
Actually, it depends on language, data and objectives of the study.
  • asked a question related to Machine Translation
Question
5 answers
Looking for Neural Machine Translation Tool
  • asked a question related to Machine Translation
Question
4 answers
WSD system still not used in most applications that need disambiguation, (for example machine translation (MT), its an isolated module , a lot of researcher develop this approaches, which is not an end in itself , but this approaches , still not applied in the  dedicated applications, so why?     
Relevant answer
Answer
Dear Asma Djaidri,
Thank you for the interesting question. As you have rightly observed, WSD plays a pivotal role in machine translation because words usually have multiple senses in a given context. Notably, in human translation, the translator enjoys reflective, neurologic mechanisms for creating a relevant translation. However, in machine translation, there is need for intelligent computational techniques. As an illustration, particular machine learning software must be programmed in which  a classifier is trained to choose a particular sense from among the existing senses. Naturally, this is possible by carefully preparing a corpus of  sense-annotated instances defining a specific word. For more details, I refer you to the following links, which can hopefully satisfy the question.
Best regards,
R,. Biria
  • asked a question related to Machine Translation
Question
3 answers
KNN in contrary to other algorithms doesn’t build any model, do you think this is a privilege because it doesn’t depend on any training data so when we enrich the data its performance increases?
Relevant answer
Answer
Hi Ikram,
Yes, this can be correct; as it doesn't form any assumptions about characteristics of the concepts. In addition, KNN learning cost is very low especially when the data is small. Finally, complex aspects can be learned using simple procedures.
HTH.
Samer
  • asked a question related to Machine Translation
Question
3 answers
i am working on decoding sanskrit to hindi or hindi to english at a time but there is linguistic problem therefore i have no any same dataset on sanskrit to hindi or hindi to english then please provided me dataset resource.
Relevant answer
Answer
i am working on sanskrit-hindi translation but i have no parallel data of sanskrit-hindi.
  • asked a question related to Machine Translation
Question
5 answers
As I'm new to the topic, I'm looking for information on benchmark corpora that can be obtained (not necessary free) for audio events classification or computational auditory scene analysis.
I'm especially interested in house/street sounds.
Relevant answer
Answer
The dcase is indeed a good reference
depending on what you are looking for you can also have a look to the
sweet-home corpora : http://sweet-home-data.imag.fr/
and
both having sounds captured in a home
  • asked a question related to Machine Translation
Question
5 answers
I am searching for parser using lexical structure .
Relevant answer
Answer
See TurboParser (Dependency Parser with Linear Programming)
  • asked a question related to Machine Translation
Question
2 answers
I want to know how a task of paradigm identification can be of any use to tasks like MT or any other task in Natural language processing
Relevant answer
Answer
Inflectional paradigms are useful when you know only one form of the word and what to construct the others. It is straightforward for MT (to construct surface forms) in POS tagging and lemmatization (to predict inflection of out-of-dictionary words) and so on
  • asked a question related to Machine Translation
Question
4 answers
I have multiple corpora from multiple sources and are in different formats. Is there any way to use all these in Stanford NER?
Relevant answer
Answer
It happened with me when I needed to train the malt parser which only accepts CoNLL format. I had three corpora, only one of them was in the CoNLL format and I needed to write a script to convert between the different formats. I guees you need to do that. sometimes the formats are different in the information they hold and some could be richer than others but I hope it is not your case. I hope it helps.
  • asked a question related to Machine Translation
Question
7 answers
I am working on a project where I need to calculate the perplexity or cross-entropy of some text data. I have been using MITLM, but it does not seem to be very well documented. Can anyone suggest an alternative of MITLM? Thanks!
Relevant answer
Answer
SRILM is quite handy and well-documented. FAQ explains how to compute ppx with ngram tool http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html
Another really short and handy explanation about ppx with SRILM was described here  http://cmusphinx.sourceforge.net/wiki/tutoriallm
  • asked a question related to Machine Translation
Question
11 answers
I am trying to develop a rule based machine translation system using prolog so I need prolog materials please help me.
Relevant answer
Answer
I recommend you another excellent handbook, namely "Logic, Programming and Prolog (2ed)" by U. Nilsson and J. Maluszynski:
With kind regards,
Adam
  • asked a question related to Machine Translation
Question
3 answers
Hi,
I need to write a tool that would by given keywords search in database of articles and recommend to users the most likely article containing proper information. I was thinking to use following search heyristics:
1. if keywords are in text in close proximity (near each other) it is more probable that article is on topic
2. if I can find article on some topic lets say from 2008 and on the same topic from 2012 (so newer) with many negations in text I could assume that old research was wrong and I should proritise newer article
3. I should allow queries in which user could define if he is looking exactly for some amount of keyword in one text or only one or more of them. Or for example that some keywords must be found and some does not
Are my assuptions correct ? Do you have any better ideas to return more accurate results ?
Relevant answer
Answer
Hi,
This is an information retrieval task. I suggest you to read the book by Manning (http://nlp.stanford.edu/IR-book/).
One of the simplest (but also efficient and effective) way is, as Yusniel already pointed out, to express both the documents and the query in the vector space model (https://en.wikipedia.org/wiki/Vector_space_model)
1) Pre-process your documents: remove stopwords and do stemming (and other processes you can find in the book)
2) Extract the vocabulary from the documents (all the remaining unique words)
3) Index the documents using the extracted vocabulary to create a term-document matrix, where terms(words) are in rows and document in columns. The most easy is to count the frequency of each word in each document: https://en.wikipedia.org/wiki/Document-term_matrix
4) Process the query with the vocabulary you extracted from the documents to form a query vector, also counting the number of times a word inside the query (usually they appear only once)
5) Compute the cosine similarity between your query vector and each one of the document vectors.  https://en.wikipedia.org/wiki/Cosine_similarity
6) Sort the similarities in descending order and select the top k documents as the more relevant for your query.
You can perform several other steps between the steps described above to improve the retrieval performance, such as extract n-grams instead of individual words, weighting the words in a different way than the frequency (e.g. term-frequency-inverse-document-frequency), vector normalization, Latent Semantic Indexing, etc.
In Python you can use the nltk package to do most of this things. Here is a webpage where they show how to do it:
Finally, there are other models different from the vector space, such as the probabilistic retrieval.
  • asked a question related to Machine Translation
Question
1 answer
I'm working in Arabic machine translation and I want to analyse the sentence in order to translate it to the target language. I used MADAMIRA to perform a morphological analysis and I want to perform syntactic analysis, which tools should I use in this stage?
Relevant answer
Answer
You might also have come across the Moses MT toolkit (http://www.statmt.org/moses/) ? 
  • asked a question related to Machine Translation
Question
11 answers
Word alignment tool for machine translation 
Relevant answer
Answer
Great to hear that you solved the problem. It seems the issue was on the 3rd step of the installation.
  • asked a question related to Machine Translation
Question
8 answers
Are there any open source tools for language detection and/or translation when a single sentence contains multiple languages?
Relevant answer
Answer
Thanks Santanu. It's a great suggestion. I'll give it a try with n-gram model
Cheers,
Thushari
  • asked a question related to Machine Translation
Question
1 answer
I am working on context-based machine translation for English to Marathi
  • asked a question related to Machine Translation
Question
3 answers
I want to do translation project about
- translation text between English to Hindi
- machine translation
- study about contrastive analysis from translated text etc.
I am not Hindi and English native speaker (I am Thai) but now i am studying translation course about Hindi and English in India. And I am Thinking about project.
Relevant answer
Answer
How about translating Hindi and English advertisements - rich in semantics and language variation. You will explore more on the cross-cultural issues too. All the best
  • asked a question related to Machine Translation
Question
2 answers
It is observed that the Stanford parser does wrong POS tagging for some lexical categories.
Relevant answer
Answer
If you are using (Moses Decoder), you can tag and supertag your corpus accurately. 
  • asked a question related to Machine Translation
Question
3 answers
I have considered 3 datasets and 4 classifiers & used the Weka Experimenter for running all the classifiers on the 3 datasets in one go.
When I Analyze the results, considering say classifier (1) as the base classifier, the results that I see are :
Dataset (1) functions.Linea | (2)functions.SM (3) meta.Additiv   (4) meta.Additiv
--------------------------------------------------------------------------------------------------
'err_all' (100)  65.53(9.84) |    66.14(9.63)          65.53(9.84) *      66.14(9.63)
'err_less' (100) 55.24(12.54) | 62.07(18.12) v    55.24(12.54) v    62.08(18.11) v
'err_more' (100) 73.17(20.13) | 76.47(16.01)     73.17(20.13) *    76.47(16.02)
--------------------------------------------------------------------------------------------------
                            (v/ /*) |              (1/2/0)                 (1/0/2)                 (1/2/0)
As far as I know:
v - indicates that the result is significantly more/better than base classifier
* -  indicates that the result is significantly less/worse than base classifier
Running multiple classifiers on single database is easy to interpret, but now for multiple datasets, I am not able to interpret which is better or worse as the values indicated do not seem to match the interpretation.
Can someone pls. help interpret the above result as I wish to find which classifier performs the best & for which dataset.
Also what does (100) next to each dataset indicate?
'err_all' (100), 'err_less' (100),  'err_more' (100)
Relevant answer
Answer
The methodology for comparing n machine learning (ML) methods over m data sets is described in (Demšar, 2006). Please see the link.
I show you an example in the attached document.
  • asked a question related to Machine Translation
Question
3 answers
Does anyone have any translation memory or even pairs of translated sentences, which can be prepared as Translation Memory.
Relevant answer
Answer
our college has a bible in albanian.  that's always a good corpus.
  • asked a question related to Machine Translation
Question
3 answers
Hello to all. I can't find any contributions to machine translation using pregroup grammar or Lambek calculus, on the net. I am working on this and wanted to know if there is any literature.
Relevant answer
Answer
Dear Muhammad ,
please find  the file in the attachment.
hope it helps .
  • asked a question related to Machine Translation
Question
2 answers
Hi All,
I have implemented phrase-based model in MOSES, now I wanted to implement "String-to-Tree" or "Tree-to-String" model (because one of my language is under resource language and therefore I wanted to add linguistic to one side of the translation i.e. source language will have linguistic rules or the target language, but not the both. 
I wanted to know is there any research paper or any tutorial which show how to implement these models in MOSES.
Thanks,
Asad
Relevant answer
Answer
see on Moses website Factored translation models.
  • asked a question related to Machine Translation
Question
4 answers
Machine translation is pure statistical or some cognitive algorithms are being explored in these systems? If yes then what are those algorithms and approaches? 
Relevant answer
Answer
Dear Sandeep,
Next you'ill find a link to a set of excellent resources on Machine Learning, where you surely will be able to explore new applications and algorithms created to solve problems in this rich field of ML:
  • asked a question related to Machine Translation
Question
1 answer
It will be appreciated if I could have examples with code, tutorial or any other useful resource.
Relevant answer
  • asked a question related to Machine Translation
Question
4 answers
I am trying to use Stanford TokensRegex to design patterns. I am attempting to catch "A manager may manage at most 2 branches" where it has been mentioned once in the text, however I failed to get it. below is my code
String file="A store has many branches. Each branch must be managed by at most 1 manager. A manager may manage at most 2 branches. The branch sells many products. Product is sold by many branches. Branch employs many workers. The labour may process at most 10 sales. It can involve many products. Each Product includes product_code, product_name, size, unit_cost and shelf_no. A branch is uniquely identified by branch_number. Branch has name, address and phone_number. Sale includes sale_number, date, time and total_amount. Each labour has name, address and telephone. Worker is identified by id’.";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
// create an empty Annotation just with the given text
Annotation document = new Annotation(file);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences)
{
TokenSequencePattern pattern = TokenSequencePattern.compile("A manager may manage at most 2 branches");
String sentence1=sentence.toString();
String[] tokens = sentence1.split(" ");
TokenSequenceMatcher matcher = pattern.getMatcher(document.get (CoreAnnotations.SentencesAnnotation.class));
while( matcher.find()){
JOptionPane.showMessageDialog(rootPane, "It has been found");
}
}
Please suggest any books, articles which could help me in learning to design patterns in Stanford TokensRegex within Stanford CoreNLP.
Relevant answer
Answer
Consider the following code
Binding of variables for use in compiling patterns:
Use Env env = TokenSequencePattern.getNewEnv() to create a new environment for binding
Bind string to attribute key (Class) lookup: env.bind("numtype", CoreAnnotations.NumericTypeAnnotation.class);
Bind patterns / strings for compiling patterns
// Bind string for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", "/today|yesterday|tomorrow|tonight|tonite/");
// Bind pre-compiled patter for later compilation using: compile("/it/ /was/ $RELDAY");
env.bind("$RELDAY", TokenSequencePattern.compile(env, "/today|yesterday|tomorrow|tonight|tonite/"));
  • asked a question related to Machine Translation
Question
5 answers
In my experience MaxEnt is always better than SVM for natural language processing task, like text classification, machine translation, named entity extraction. I've tried to train MaxEnt with different parameters and I find that  SVM outperforms always MaxEnt.
Relevant answer
Answer
Your question is very confusing.  First you say that MaxEnt > SVM for NLP but then you say that SVM > MaxEnt.
Is there a clarification you could add to your question?
Also, if you are using linear classifiers, I would strongly recommend checking out L1 regularized logistic regression.
If you are using kernel methods to get non-linear classifiers, check out a recent implementation of neural networks with drop-out regulariztion, rectified linear units and accelerated learning like Adagrad.  You should get superior result versus your SVM or MaxEnt result.
  • asked a question related to Machine Translation
Question
19 answers
Formal evaluations of machine translation (MT) systems give a user a general sense of translation accuracies. For users with a passing knowledge of a target language, it is hard to establish reliability of translation results from their native source language. Have you seen any MT systems addressing the issue of confidence? Perhaps some interactive interface features? One of my students asked me: "How do I know when to trust MT? How do I evaluate the accuracy of the results?" Neither of the extreme answers seem satisfactory. Nor are they practical. You could say: "Well, you can't trust MT". Or you could suggest: "Consult a native speaker, or learn the target language". Is there an in-between answer that's feasible? Or, is the MT field simply "not there" yet in terms of making Watson-like goodness-of-fit judgements?
Relevant answer
Answer
Despite the massive researches in MT, MT still immature yet, that is because a high quality translation is based on full understanding of speech/text, understanding speech is beyond machine's capability. however, for a simple check to get some confidence in the output of an MT software you may do:
1- use Mt to translate your text from source to destination language.
2- use the output text to be translated back to your source language
3- compare your original text with the output from step 2
4- the more similarity you get the higher confidence will be.
  • asked a question related to Machine Translation
Question
7 answers
Machine translation is one of the oldest subfields of artificial intelligence research. However, real progress was much slower. The use of statistical machine translation systems has led to significant improvements in the translation quality. Nevertheless, systems utilizing both statistical methods and deep linguistic analyses are better.
So, there are any courses online about these techniques and how to use them?
Relevant answer
Answer
Actually, there is a Massive Open On-line Course on Machine Translation entitled: "Approaches to Machine Translation: rule-based, statistical and hybrid".
It could be interesting.
For any information about it, visit:
  • asked a question related to Machine Translation
Question
13 answers
Modern Machine Translation Systems do not have a mechanism for interpretation of the input messages on models of concepts systems.
Relevant answer
Answer
Probably is the lack of correct classification: the same word means different things for different fields, so machines fail. Mayby they should offer an option for declaring the field from the user.
  • asked a question related to Machine Translation
Question
7 answers
Any electronic resources include books, example, tutorial are appreciated.
Relevant answer
Answer
A simplified definition of a token in NLP is as follows: A token is a string of contiguous characters between two spaces, or between a space and punctuation marks. A token can also be an integer, real, or a number with a colon (time, for example: 2:00). All other symbols are tokens themselves except apostrophes and quotation marks in a word (with no space), which in many cases symbolize acronyms or citations. A token can present a single word or a group of words (in morphologically rich languages such as Hebrew) as the following token "ולאחי" (VeLeAhi) that includes 4 words "And to my brother".
A stirng as written by one of the previous researchers who responded
is a oncept taken from programming languages.
  • asked a question related to Machine Translation
Question
4 answers
Except Google Translate?
Relevant answer
Answer
The following are the translation services that support Albanian language :
Yandex even exposes a free API with the help of which you can build an automated translation tool ( unlike Google translate ).
Cheers!
  • asked a question related to Machine Translation
Question
29 answers
Say, you want to know what the Russian and Ukrainian online news are saying about the recent Kiev events and the EU Summit? Or, say, you'd like to see first hand what the Chinese or Malaysian authorities are reporting on the missing flight MH370?
What would you do (assuming you don't know those languages)? What would be your strategies? Have you ever been able to solve an information problem like that across languages with any particular app or technology?
Relevant answer
Answer
My assumptions
1. The writing of the language should be ‘recognizable’ to the reader
2. There should be words in the text which look like words in languages known to the reader
3. The syntactic arrangements in the language should be identical to that of at least one of the languages known to the reader
4. Then, it should be possible to get information from such a text
Techniques
1. Isolate the transparent words and interrogate them
2. Refine your views by drawing on logical conclusions
3. Reconstruct your views
Practice Text (In German)
Werd sind die Gideons?
Diese Frage oft von den Lesern dieser Neuen Testamente gestellt. Die nachfolgenden Einzelheiten geben Ihnen darüber Auskunft. Im Herbst 1898 begegneten sich in einem Hotel in Wisconsin/ USA zwei fremde Handelsreisende. Sie erkannten, daß sie beide Chriten waren, deshalb hielten sie demeinsam ihre Abendandacht. Gott gab ihnen den Gedanken, eine Vereinigung christlicher Handelsreisender zu gründen. Dieses Vorhaben fünhrten sie im folgenden Jahr mit einem Dritten aus. Nach gemeinsamem Gebet wählten sie den Namen ‘‘Gideons’’ aus dem Buch der Richter, Kapital 6 und 7im Alten Testamente. Gideon war der Führer einer kleinen Gruppe von Männern, die bereit waren, Gott zu dienem. Durch sie konnte Gott viel für sein Volk Israel tun.
Transparent English words noted in the text
• The heading has ‘‘Gideons and a question mark’’
(The question mark makes the reader think the writer is interrogating the name ‘‘Gideons’’. Anything that may follow may be a history and/ or explanation of the name).
• Neuen Testamente # Alten Testamente
• Hotel Wisconsin/ USA
• Christen
• Im Herbit 1898 (the syntactic arrangement suggest ‘‘the phrase means’’ in the year 1898).
• The position of ‘‘den’’ in the text suggests that it means ‘‘the’’.
• (Kapitel) 6 und 7
• Namen Gideon Israel
These isolated words appear to explain why the name ‘‘Gideons’’ was chosen. The passage suggested that a meeting took place at a hotel in Wisconsin in USA in the year 1898 involving Christians. The meeting had to do with the New Testament. A man from Israel, whose name ‘‘Gideons” is being considered is found in chapter 6 and 7 of one of the books in the Old Testament.
I have applied the techniques and I want people who speak German to tell if the technique has helped somehow in getting some information in this language that is unknown to me. I must say that some of the information derived may be right while others may not. This is natural. In real life situation, not all pieces of information that one receives are true.
Verification
To ascertain the truth of the message it has to be verified. In this regard, the learner would identify and seek help from a German/ English bilingual for assistance. The learner can also use a reliable language translation device that may be available to him.
  • asked a question related to Machine Translation
Question
4 answers
I am developing a Part-of-Speech (POS) Tagger system for isiXhosa one of South African official language. The system needs to be intelligent enough such that it identifies POS of each word (e.g. noun, verb,...) based on the context that the word is in. The problem with POS tagging is the issue of ambiguity. One word can have more than one POS.
Example: The word book;
1. I want to book a 07:30 flight to Cape Town.
2. Xolani can you please pass me that book.
The word book from the above two sentences have different part of speech tags (i.e. verb and noun respectively). The same problem occurs for isiXhosa and it is what I am solving.
Now I want to use Maximum Entropy Markov Model (MEMM) to solve this problem. My problem is the implementation of MEMM in PHP, can someone refer me to a link that will help solve this problem.
Thank you in advance
Relevant answer
Answer
I really don't think you want to have this as PHP since its will be very inefficient. I would recommend you do the 'back-end' work on a Python/Java program and just pass the results to PHP. PHP scripting is designed for basic sever-side things, not something like what you want to do. There are many ways of connecting Python/Java to PHP, and I am sure you can find an MEMM implementation in those languages.
  • asked a question related to Machine Translation
Question
5 answers
I am working on context-base English to Marathi Machine Translation. I wanted to disambiguate nouns using Rule-Based MT approach for English to Marathi MT.
Relevant answer
Answer
Hi Goraksh, I think this link will be quite useful for you:
The IndoWordNet and MarathiWordNet seem to have Marathi as a supported language.
As the others have said, to disambiguate words, it is important to build an ontology of every word with its word senses. Luckily for us, there are research groups who have done this for most languages, so all you have to do is use these resources for your purpose.
  • asked a question related to Machine Translation
Question
7 answers
When we transcribe a spoken corpus, the corpus we obtain can we describe it as written?
Relevant answer
Answer
In corpus linguistics, “any language whose original presentation was in oral form” is considered as ‘spoken language’.
  • asked a question related to Machine Translation
Question
18 answers
Would anybody who has tried it please comment. Including which MT program has been used.
Relevant answer
Answer
Hi Ian
Thanks for your answer - I see what you mean now. I thought you knew systems that could produce different text versions, maybe based on complexity or something else. That would really be a novelty.
I do find the Google queries very useful - they can give the kinds of interactions that are useful.
Thanks again.
  • asked a question related to Machine Translation
Question
3 answers
I wanted to develop a machine translation system, using anyone of these three techniques RBMT, SMT and EBMT. But I don't know how and from where should I start development. I've used MOSES SMT but was unable to perform any of the tasks.
Kindly suggest to me which technique I should use in order to develop system with little resource and less time.
PS. Source and Target language are English and Urdu vise verse.
Relevant answer
Answer
I have used this language pair on Moses. Have you tested that Moses itself is fully installed by running on one of their sample corpora? If YES then you should run "clean-corpus-n.perl" script from your installed Moses folder. Then you should study the difference in your "original" and "cleaned" corpus. I had experienced to see only one special mark that was causing failure in the training process. When I removed / replaced that one character, it started working.
  • asked a question related to Machine Translation
Question
8 answers
Please let me know about Moses
Relevant answer
Answer
Hello,
There are such software asL AppTek, Asia Online, Moses, Anusaaraka, Apertium, Matxin, BingTranslator, Babylon, Systran, WorldLingo, OpenLogos, NiuTrans, but the most popular is the Goggle Translate. However, you have to remember about a few shortcoming of software in Statistical Machine translation:
1) The results are unexpected. Superficial fluency can be deceiving.
2) Statistical machine translation do not work well between languages that have significantly different word orders (e.g. Japanese and European languages).
3) The benefits are overemphasized for European languages.
  • asked a question related to Machine Translation
Question
7 answers
At word level, how can I state that this word is the best translation of an other word?
Relevant answer
Answer
As Radu mentioned, there is no such thing as the best translation of one word because it depends on the specific sense associated with the source text word. Further, I would even add that there is no reason to study one-word translations because we speak with many words closely associated syntactically. So the meaning of a word depends ultimately on the interaction of the meaning of each and other words it is used with.
  • asked a question related to Machine Translation
Question
10 answers
What is your opinion about the advantages and disadvantages of online machine translation?
Relevant answer
Answer
They are good for translators who don't have certain dictionaries. It also helps in some ways but in general long texts need to be translated by a person. Nuances, cultural differences, and vocabulary that is very local need to be translated by a person. Yet,MT can help. .
  • asked a question related to Machine Translation
Question
3 answers
Especially when Arabic dialects are the languages ​​spoken only.
Relevant answer
Answer
I think twitter API could help in that. You can build large dialects corpus in fast way.
Thanks
Mohammed