Conference Paper

Translation memory for indian languages: an aid for human translators

Authors:
  • Banasthali Vidyapith, India
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Translation of a text in one's local language can be very time consuming. Same text which needs to be translated, appears several times. This is very taxing for the translator, as she has to provide repetitive translations for same text. This paper shows the design of a translation tool which provided automatic translations for completely or partially reappearing text. This makes the task of the translator much easier, as she has to concentrate only on the text with no similarities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Mathur et al. [5] proposed the matching ontology evaluation tool used by Joshi et al. [6] proposed a machine translation engine. Tahir et al. [7] proposed a knowledge base machine translation model. ...
Article
Full-text available
English to Urdu machine translation is still in its beginning and lacks simple translation methods to provide motivating and adequate English to Urdu translation. In order to make knowledge available to the masses, there should be mechanisms and tools in place to make things understandable by translating from source language to target language in an automated fashion. Machine translation has achieved this goal with encouraging results. When decoding the source text into the target language, the translator checks all the characteristics of the text. To achieve machine translation, rule-based, computational, hybrid and neural machine translation approaches have been proposed to automate the work. In this research work, a neural machine translation approach is employed to translate English text into Urdu. Long Short Term Short Model (LSTM) Encoder Decoder is used to translate English to Urdu. The various steps required to perform translation tasks include preprocessing, tokenization, grammar and sentence structure analysis, word embeddings, training data preparation, encoder-decoder models, and output text generation. The results show that the model used in the research work shows better performance in translation. The results were evaluated using bilingual research metrics and showed that the test and training data yielded the highest score sequences with an effective length of ten (10).
... Singh et al. [18] developed a POS tagger for Marathi using supervised learning. Joshi et al. [19] further developed a technique to using machine learning in evaluating MT engines. Tyagi et al. [20] [21] developed an approach of translating complex English sentences by first simplifying them and then translating into Hindi. ...
Article
Full-text available
Part of speech tagging is the initial step in development of NLP (natural language processing) application. POS Tagging is sequence labelling task in which we assign Part-of-speech to every word (Wi) which is sequence in sentence and tag (Ti) to corresponding word as label such as (Wi/Ti…. Wn/Tn). In this research project part of speech tagging is perform on Hindi. Hindi is the fourth most popular language and spoken by approximately 4billion people across the globe. Hindi is free word-order language and morphologically rich language due to this applying Part of Speech tagging is very challenging task. In this paper we have shown the development of POS tagging using neural approach.
... The third MT methodology was based on example-based approach. For this we have used the system developed at Banasthali Vidyapith [50]. Details of the MT systems developed is shown in table 2. ...
Article
Full-text available
In this paper, we have explored a pivot-based approach in the development of a machine translation system whose parallel corpus is not available. For our study, we have taken Arabic-Hindi as the language pair for the development of MT system and Urdu as the pivot language. We have developed 4 MT systems using this approach. These 4 MT systems work on different methodologies. Among them, Hierarchical Phrase-Based Machine Translation System produced better results.
... The third MT methodology was based on example-based approach. For this, we have used the system developed at Banasthali Vidyapith [52]. Details of the MT systems developed is shown in table 2. ...
Article
Full-text available
In this paper, we have explored a pivot-based approach in the development of machine translation system whose parallel corpus is not available. For our study, we have taken Arabic-Hindi as the language pair for development of MT system and English as the pivot language. We have developed 4 MT systems using this approach. These 4 MT systems work on different methodologies. Among them Hierarchical Phrase-Based Machine Translation System produced better results.
... Therefore, the access and retrieval speed and accuracy should be evaluated using different string similarity metrics. This paper focuses on the string similarity metrics employed so as to have proper translation retrieval around the Hindi to English TM [7] [8] and this is achieved by employing the N-gram modelling [16] approach, string similarity function and threshold. The bigram model [13] is used to consider the local context or the character order or maintain co-occurrence of words in the Hindi sentences and the obtained bigram phrase of the current input sentence are further matched with the bigrams of the source sentence in TM. ...
Article
Full-text available
This paper discusses the bigram model with 3-operation Edit Distance (Levenshtein Distance) String matching metrics for translation retrieval in Hindi-English Translation memory system. In this method we used the statistical language modeling (N-gram approach) to compute bigrams and then implemented the dynamic programming algorithm Levenshtein Distance to find the minimum number of edit operations required transforming one bigram to another and will act as a measure to provide extent for the matching of current input and the source in the TM. This measure will decide whether the translation retrieved correspondingly will be exact match or fuzzy match. Other string matching approaches are evaluated with Levenshtein Distance proving to be more effective comparatively.
... E5 was a simple phrase based MT system which also used Moses MT toolkit. E6 was an example based MT system that was developed by Joshi et al. [23] [24]. These three systems used the 35000 English-Hindi parallel corpora to train and tune themselves. ...
Article
Full-text available
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine translators being developed, but getting a high quality automatic translation is still a very distant dream . The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system, which employs some machine learning techniques and morphological features. In ranking no human intervention is required. We have also validated our results by comparing it with human ranking.
... E5 was a simple phrase based MT system which also used Moses MT toolkit. E6 was an example based MT system that was developed by Joshi et al. [14] [15]. These three systems used the 35000 English-Hindi parallel corpora to train and tune themselves. ...
Article
Full-text available
Since long, research on machine translation has been ongoing. Still, we do not get good translations from MT engines so developed. Manual ranking of these outputs tends to be very time consuming and expensive. Identifying which one is better or worse than the others is a very taxing task. In this paper, we show an approach which can provide automatic ranks to MT outputs (translations) taken from different MT Engines and which is based on N-gram approximations. We provide a solution where no human intervention is required for ranking systems. Further we also show the evaluations of our results which show equivalent results as that of human ranking.
... This type of model, where at one end we have a parse tree and on the other end we have a string, is termed as a hireo grammar. We also used an example based MT system [20] [21] that we had developed to understand the modalities of EBMT and later used it as a Translation Memory. ...
Article
Full-text available
Machine translation evaluation is a very important activity in machine translation development. Automatic evaluation metrics proposed in literature are inadequate as they require one or more human reference translations to compare them with output produced by machine translation. This does not always give accurate results as a text can have several different translations. Human evaluation metrics, on the other hand, lacks inter-annotator agreement and repeatability. In this paper we have proposed a new human evaluation metric which addresses these issues. Moreover this metric also provides solid grounds for making sound assumptions on the quality of the text produced by a machine translation.
Article
Full-text available
Today, we are living in a global world, where linguistic communication in several languages is used as a channel or interaction in between boundaries, to meet our fundamental needs. All of the above is feasible because of the process of linguistic translation. The translation is the natural extension of any verbal communication we desire to express and spans four bridges: personal, linguistic, cultural, and commercial. All cognitive ideas exchanged from antiquity to the present rely on individuals who can shift words from one language to another in order to create sentences, thoughts, feelings, and desired themes. India is an ethnically and linguistically diverse country that acknowledges and treats with respect many practices, like cultures, traditions, and languages. Linguistic variety is extensive and distinct in every region of the nation. There are five language groups in India, 14 main writing systems, 400 spoken languages, and thousands of dialects. Throughout India's history, translation has served to bind the country together. Without translation's purpose, ideas and notions such as "Indian literature," "Indian culture," "Indian philosophy," and "Indian knowledge systems" would not have been feasible. Translation is critical in a multilingual society like India, where a new language is introduced every 25 kilometres, and translation in multiple languages encourages the national integration of the country's numerous regional cultures. Linguistic translation creates multilingual knowledge with different linguistic cultures and literature, and national cohesion may be achieved by developing a shared social vision. The translation is also required for our country's emotional liberation and well-being. This paper described the scope, need, importance, and procedures for translation in Indian languages and vice versa, with special reference to the conference theme "Translating Across Languages, Cultures, & Disciplines". We have framed questions for data collection related to the scope and procedures of language translation, its popularity, and futuristic trends, and collected online responses from the concerned respondents. We used descriptive techniques for our study, using a random selection sampling method. The data was collected from a variety of primary and secondary sources, including websites, magazines, journals, Google Forms, and expert comments. The questionnaire (Google Form) was used to capture data, which was then analyzed with diagrams. Keywords: translation, Indian languages, communication, multilingualism, accuracy,
Conference Paper
Quality Estimation is a new research area in natural language processing where machine learning techniques are used to estimate the quality of machine translation outputs. In this paper we have discussed our experience of performing quality estimation for English-Hindi language pair. We have shown the use of some seventeen language independent features and then added some linguistic features to the feature set and analyzed the performance of the system by training two different classifiers on distinct sets. The results of these classifiers are compared with the results of human evaluators. Moreover we have also compared the results of these classifiers with some of the popular evaluation metrics.
Article
MT evaluation is a very important activity in MT system development. Evaluation of MT systems can help MT developers in understanding the short-comings of their systems and clear focus on the problem areas, so that systems performance can increase. In this paper we have discussed evaluation of some English-Hindi MT engines. For this, we have applied human as well as automatic evaluations of these systems. Automatic evaluation metrics across linguistic levels have been used to perform this study.
Article
From a project manager’s perspective, Machine Translation (MT) Evaluation is the most important activity in MT development. Using the results produced through MT Evaluation, one can assess the progress of MT development task. Traditionally, MT Evaluation is done either by human experts who have the knowledge of both source and target languages or it is done by automatic evaluation metrics. These both techniques have their pros and cons. Human evaluation is very time consuming and expensive but at the same time it provides good and accurate status of MT Engines. Automatic evaluation metrics on the other hand provides very fast results but lacks the precision provided by human judges. Thus a need is being felt for a mechanism which can produce fast results along with a good correlation with the results produced by human evaluation. In this paper, we have addressed this issue where we would be showing the implementation of machine learning techniques in MT Evaluation. Further, we would also compare the results of this evaluation with human and automatic evaluation.
Article
Full-text available
For the past 60 years, Research in machine translation is going on. For the development in this field, a lot of new techniques are being developed each day. As a result, we have witnessed development of many automatic machine translators. A manager of machine translation development project needs to know the performance increase/decrease, after changes have been done in his system. Due to this reason, a need for evaluation of machine translation systems was felt. In this article, we shall present the evaluation of some machine translators. This evaluation will be done by a human evaluator and by some automatic evaluation metrics, which will be done at sentence, document and system level. In the end we shall also discuss the comparison between the evaluations.
Conference Paper
Machine Translation Evaluation is the most formidable activity in Machine Translation Development. We present the MT evaluation results of some of the machine translators available online for English-Hindi machine translation. The systems are measured on automatic evaluation metrics and human subjectivity measures.
Conference Paper
Full-text available
Translators are increasingly turning to electronic language resources and tools to help them cope with the demand for fast, high-quality translation. While translation memory tools seem to be well known in the translation industry at large, bilingual concordancers appear to be familiar primarily in academic circles. The strengths and weaknesses of these two types of tool are analyzed in an effort to recommend those circumstances in which each could best be applied.
Conference Paper
Full-text available
The paper introduces a new research strategy for the investigation of human translation behavior. While conventional cognitive research methods make use of think aloud protocols (TAP), we introduce and investigate User- Activity Data (UAD). UAD consists of the translator's recorded keystroke and eye-movement behavior, which makes it possible to replay a translation session and to register the subjects' comments on their own behavior during a retrospec- tive interview. UAD has the advantage of being objective and reproducable, and, in contrast to TAP, does not interfere with the translation process. The paper gives the background of this technique and an example on a English-to-Danish trans- lation. Our goal is to elaborate and investigate cognitively grounded basic trans- lation concepts which are materialized and traceable in the UAD and which, in a later stage, will provide the basis for appropriate and targeted help for the trans- lator at a given moment.
Article
Full-text available
Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.
Article
A translation memory system is a new type of human language technology (HLT) tool that is gaining popularity among translators. Such tools allow translators to store previously translated texts in a type of aligned bilingual database, and to recycle relevant parts of these texts when producing new translations. Currently, these tools retrieve information from the database using superficial character string matching, which often results in poor precision and recall. This paper explains how translation memory systems work, and it considers some possible ways for introducing more sophisticated information retrieval techniques into such systems by taking syntactic and semantic similarity into account. Some of the suggested techniques are inspired by those used in other areas of HLT, and some by techniques used in information science.
Article
The only way in which the power of computers has been brought to bear on the problem of language translation is machine translation, that is, the automation of the entire process. Machine translation is an excellent research vehicle but stands no chance of filling actual needs for translation which are growing at a great rate. In the quarter century during which work on machine translation has been going on, there has been considerable progress in relevant areas of computer science. However, advances in linguistics, important though they may have been, have not touched the core of this problem. The proper thing to do is therefore to adopt the kinds of solution that have proved successful in other domains, namely to develop cooperative man–machine systems. This paper proposes a translator's amanuensis, incorporating into a word processor some simple facilities peculiar to translation. Gradual enhancements of such a system could eventually lead to the original goal of machine translation.
Cognitive Adaptation in Translation. <i>Letras de Hoje&lt
  • A Buchweitz
  • F Alves
Beyond Translation Memories. Paper presented at the Workshop on Example Based Machine Translation, held at Machine Translation Summit VIII
  • R Schäler