Chapter

Pedagogical Applications of Chinese Parallel Corpora

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Parallel corpora are a unique resource in language acquisition that enables learners to conceptualize a target language through the established schemas of their first language by providing parallel representations of text in two or more languages. Parallel corpora are defined as specialized translation corpora that consist of source texts in one language that are aligned with translation texts in one or more additional languages. The following chapter thoroughly explores the pedagogical application of parallel corpora in general, before taking an in-depth look at how English L1 beginning-level learners of Mandarin Chinese applied a Chinese–English parallel corpus. In addition to elucidating the specific observed outcomes of parallel corpora in this unique learning context, numerous parallel corpus resources are detailed with suggestions for pedagogical application, and an extensive review of potential further applications based on continued research in the field is enumerated and analyzed.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Full-text available
This article delves into the trajectory of corpus translation studies (CTS) over the past two decades, summarizing key areas of existing research and identifying potential gaps and challenges within the field. The review encompasses various research areas, including translation universals, translator style, translation norms, and translation pedagogy. It acknowledges the valuable contributions made in these areas while also highlighting potential areas for improvement, such as the need to incorporate functional aspects in translator style research and align translation training programs with professional requirements. The review introduces (De Sutter, Gert, and Marie-Aude Lefer. 2020. “On the Need for a New Research Agenda for Corpus-Based Translation Studies: A Multi-Methodological, Multifactorial and Interdisciplinary Approach.” Perspectives 28 (1): 1–23) new research agenda for CTS, which advocates for multifactorial designs, methodological pluralism, and interdisciplinarity. This agenda facilitates two analysis modes: one utilizing corpus methods to examine translation products, and the other employing diverse methods to investigate products, processes, participants, and contexts in corpus-assisted translation practices. It is argued that these two analysis modes offer valuable guidance for future corpus-assisted translation studies.
Article
Bertalign is designed to improve sentence alignment accuracy for Chinese–English parallel corpora of literary texts. Aligning bilingual literary texts is not trivial, since most of the translation is interpretative and not based on 1-to-1 mappings between source and target sentences. Existing alignment methods highlight 1-to-1 links while having difficulty coping with 1-to-many and many-to-many alignments that are common in literary texts. To overcome the weaknesses of current approaches, we propose a novel two-step algorithm for bilingual sentence alignment. The first step finds the optimal paths for 1-to-1 alignments based on the top-k most semantically similar target sentences for each source sentence using the bidirectional encoder representations from transformer-based cross-lingual word embeddings. The second step relies on search paths found in the previous step to recover all valid alignments with more than one sentence on each side of the bilingual text. A comprehensive experiment was conducted on a newly built Chinese–English literary parallel corpus and a large-scale publicly available bilingual corpus of the Bible to compare the performance of Bertalign with five baseline systems: Gale-Church, Hunalign, Bleualign, Bleurtalign, and Vecalign. The results show that Bertalign achieves the highest accuracy in terms of F1 score on the two evaluation datasets than previous methods.
Article
Full-text available
This paper aims to introduce the language corpora and the advantages of their use in the process of Chinese language acquisition. We provide practical examples of the corpora's direct and indirect use for teaching and learning Chinese as a second language. The exploratory approach towards Chinese by using various types of corpora is applicable for general language seminars as well as specialized translation seminars. The indirect use is mainly linked to the preparation of teaching materials and facilitates the curriculum design.
Chapter
his chapter describes the research design and the procedural details of the current study.
Chapter
The current chapter is an analysis of the background survey and the translation experiments of this study.
Chapter
This chapter briefly introduces the background and provides the necessary context of the current study.
Chapter
The current chapter mainly deals with the attitudes of students towards corpus use in translation.
Chapter
The current chapter reviews the related literature to provide the necessary background for the present study.
Book
This book sheds new light on corpus-assisted translation pedagogy, an intersection of three distinct but cognate disciplines: corpus linguistics, translation and pedagogy. By taking an innovative and empirical approach to translation teaching, the study utilizes mixed methods, including translation experiments, surveys and in-depth focus groups. The results demonstrated the unique advantages and at the same time called attention to possible pitfalls of using corpora for translation teaching purposes. This book enriches our understanding of corpus application in the setting of translation between Chinese and English, two languages which are each distinctly different from one another. Readers will also discover new horizons in this burgeoning and interdisciplinary field of research. This book appeals to a broad readership, from scholars and researchers who are interested in translation technology to widen the scope of translation studies, translation trainers in search of effective teaching approaches to a growing number of cross-disciplinary postgraduate students longing to improve their translation skills and competence.
Article
Full-text available
This study reveals the impact of using parallel corpora on EFL students’ writing, and how students perceive it. Female undergraduates (n=46) in an EFL writing course in Saudi Arabia were divided randomly into experimental and control groups taught by the same instructor, using the same materials. Students in the experimental group were introduced to three parallel corpora and encouraged to use them in their writing. Tests at the beginning of the semester showed no difference in English proficiency or writing ability between groups. Over the semester, students in both groups also wrote 5 writing assignments and took three writing exams. To examine students’ perceptions of parallel corpora, students in the experimental group were asked to write a self-evaluation report and answer an evaluation questionnaire. Quantitative and qualitative analysis showed significant improvement in their writing but no significant difference between groups. However, students’ perception of parallel corpora was generally positive.
Article
Full-text available
This paper investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. An English-Chinese parallel corpus was created for use in parallel concordancing -- a technique which has been developed to respond to the desire to study language in its natural contexts of use. Specific problems of dealing with Chinese characters in concordancing are discussed. A computer program called English-Chinese Parallel Concordancer was developed for this research. The operation of the program is demonstrated through screen shots. The pedagogical application of parallel concordancing in English and Chinese is illustrated through examples from some teaching and learning experiments, and the Data-Driven Learning approach is applied and explored. It is hoped that parallel concordancing in English and Chinese will become a useful and popular tool for both English and Chinese learners in their second language learning.
Article
Full-text available
Corpus-aided language pedagogy is one of the central application areas of corpus methodologies, and a test bed for theories of language and learning. This volume provides an overview of current trends, offering methodological and theoretical position statements along with results from empirical studies. The relationship between corpora and learning is examined from complementary perspectives — the study of learner language, the didactic use of corpus findings, and the interaction between corpora and their users. Reflections on current theory and technology open and close the volume. With its focus on the learner and the learning setting, Corpora and Language Learners is addressed to corpus linguists with an interest in learner language, applied linguists wishing to expand their understanding of corpora and their pedagogic potential, and language teachers wishing to critically assess the relevance of work in this field. This volume grew out of selected presentations at the 5th Teaching and Language Corpora conference in Bertinoro, Italy.
Article
Full-text available
This paper starts by discussing the reasons why linguists should be interested in parallel corpora. I outline the questions that parallel corpora enable us to ask, and relate them to traditional questions in linguistics and translation theory. The paper then suggests a method for arriving at answers to some of these questions. The proposed method builds on the notion of “modulation” from Vinay and Darbelnet (1958) and attempts to put this notion on a sounder theoretical and empirical basis. It also includes a method of sharing data from different language pairs in a “Contrastive Linguistic Database”.
Conference Paper
Full-text available
Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and used as evidence on linguistic facts which, in a monolingual context, might be unavailable to (or overlooked by) a computer program. Multilingual technologies, which to a large extent are language independent, provide a powerful support for systematic and consistent cross-lingual studies and allow for easier building of annotated linguistic resources for languages where such resources are scarce or missing. In this paper we will briefly present some underlying multilingual technologies and methodologies we developed for exploiting parallel corpora and we will discuss their relevance for crosslinguistic studies and applications.
Article
Full-text available
This paper discusses the use of concordances in the classroom, with particular reference to the pedagogical implications of the differences between parallel and monolingual concordances. Examples are given of using the two kinds of concordances in activities that involve language production, reception, correction, and testing. It is concluded that monolingual and parallel concordances have non-conflicting, complementary roles to play.
Article
This pilot study set out to determine whether a parallel corpus and a concordancer would be appropriate tools to supplement a teaching programme of German at the beginners' level in an unsupervised environment. In this instance, a beginner student of German was asked to find satisfactory answers to unknown vocabulary and formulate appropriate grammar rules for himself using the parallel corpus and concordancer as the only tools. It is shown that these tools can be of great benefit for beginners.
Article
A report is given on the Oslo Multilingual Corpus, with special reference to a new trilingual project focusing on English, Norwegian, and German. As an example, the paper examines the English verb spend and its correspondences in Norwegian and German. Correspondences are either syntactically congruent, usually containing the Norwegian verb tilbringe or the German verb verbringen , or they involve a restructuring of the clause. The patterns of correspondence are broadly comparable in Norwegian and German. Although there is a great deal of restructuring, there is also evidence of overuse of congruent structures. The findings testify to the usefulness of research based on multilingual corpora.
Article
Based on a relatively simple but innovative idea of inserting hyperlinks at the sentence level between parallel texts, a bilingual corpus of legal and documentary texts in English and Chinese has been created and made available online together with a web-based concordancer. In addition to introducing such a corpus, this paper reports a study which seeks to evaluate the usefulness of the corpus in the self-learning of legal English. The subjects involved were a group of Chinese students doing a degree in Translation in a university of Hong Kong, where English Common Law is still used after the handover in 1997 when the sovereignty of Hong Kong was restored from Britain to China. The instruments for data collection included two comprehension tasks, a questionnaire and a follow-up interview. Findings of the study indicate that students considered the bilingual corpus useful as they needed both language versions in the understanding of legal provisions though they were found to rely more on Chinese. Interesting data in relation to how users of the bilingual corpus switched between the two languages have also been obtained. This paper also investigates how the inherent characteristics of legal English contribute to the comprehension difficulty of L2 learners irrespective of the help obtained from the bilingual corpus.
Conference Paper
This paper presents a language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of align- ing sentences as finding translations of sentences chosen fr om different sources. Unlike current approaches which rely on pre-defined feature s and models, our al- gorithm employs features derived from the distributional p roperties of words and does not use any language dependent knowledge. We make use of the context of sentences and the notion of Zipfian word vectors which effe ctively models the distributional properties of words in a given sentence. We a ccept the context to be the frame in which the reasoning about sentence alignment is done. We evaluate the performance of our system based on two different measures: sentence align- ment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.2149 to 1.6022 times better in reducing the error rate in alignment accuracy and coverage for moderately sized corpora.
Chinese/English parallel corpus & learning tool. State College. Pennsylvania: The Pennsylvania State University
  • B Bluemel
Parallel corpus and lexical acquisition in Chinese learning
  • C Tsai
  • H Choi
Trilingual corpus and its use for the teaching of reading comprehension in French
  • X Xu
  • R Kawecki
Parallel Concordancing in English and Chinese and Its Pedagogic Application
  • L Wang
Meaningful texts: The extraction of semantic information from monolingual and multilingual corpora
  • X Xu
  • R Kawecki
Learning in parallel: Using parallel corpora to enhance written language acquisition at the beginning level
  • B Bluemel