ChapterPDF Available

Lexical bundles in EU law: The impact of translation process on the patterning of legal language

Authors:

Abstract and Figures

The chapter investigates the behavior of multi-word patterns in legal translation. Its objective is to explore the patterning of translator-mediated multilingual legislation with a view to gaining a better understanding of how the translation process and multilingualism constraints affect the formulaicity of legal language and examining to what extent it is possible to recreate (or rather ‘prime’) the typical patterning of legal language in translation. The chapter will test the hypothesis that translations are less patterned and less formulaic than originals and corresponding nontranslated texts in the target language. This goal will be pursued by analyzing lexical bundles (n-grams) in the Polish-language version of EU law against the English-language version of EU law and the reference corpus of Polish Domestic Law. © 2018 selection and editorial matter, Stanisław Goźdź-Roszkowski and Gianluca Pontrandolfo; individual chapters, the contributors.
Content may be subject to copyright.
A preview of the PDF is not available
... In 2018, she expanded her research with the analysis of the English Eurolect (see inter alia publications nos. 13,33,42). Her recent chapter entitled Eurolects and EU Legal Translation (see publication no. 34) overviews contemporary studies into Eurolects set in the same methodological current, namely corpus linguistics, and which, at the same time, focus on the development of the complex concept of Eurolects and their textual fit to domestic non-translated varieties of legal languages. ...
Article
Full-text available
This article consists of two sections. The first outlines the research profile of Dr hab. Łucja Biel, Prof. ucz., a Polish linguist recognised nationwide and internationally who specializes in the analysis of legal varieties of Polish and English in the context of legal translation studies, corpus linguistics and translator training. The second part contains a detailed list of publications (co-)authored or co-edited by Łucja Biel and published between 2004 and 2022.
... In this work, 4-grams are to be examined, which means that the focus will be on uninterrupted sequences of four words. Many scholars have acknowledged that n-gram is another name for lexical bundle (e.g., Allen 2010; Granger 2014; Kwary et al. 2017;Biel 2018). This type of multiword expression has usually been tackled through a frequency-based approach Wray 2008) and has played a crucial role in linguistic production (Pawley & Syder 1983;Thomson 2017). ...
Article
Full-text available
The analysis of phraseology in the specialized discourse of science has sparked researchers’ interest in the last few decades, probably because the use of word groupings in specific registers can provide information about certain typical features of the genre. For instance, Gledhill (2009) explores colligations of tenses in scientific articles and discovers that the present tense is used for qualitative and empirical expressions, while the past tense provides quantitative and research-oriented descriptions; Pérez-Llantada (2014) investigates 4-word lexical bundles in research articles, finding that these multiword combinations express referential meaning and organize the text; finally, Jiménez-Navarro (2019) analyzes adjective + noun collocations in a corpus of scientific papers and concludes that these phraseological units convey specific meanings when used in this genre, since they represent the contents of research articles. The aim of the current study is to contribute to the analysis of 4-grams in the language of science. To this end, two specific objectives are defined: first, to ascertain the structure of 4-grams; second, to analyze the function they perform. The methodology was based on a corpus and entailed five major steps: (1) a specialized corpus of research articles was built, (2) a list of 4-grams was automatically extracted using the software Sketch Engine, (3) the resulting list was manually verified in order to suppress inaccurate candidates, (4) the selected units were classified depending on their structural framework, and (5) the selected units were categorized according to their function in the text. The findings show that, in terms of the first objective, the most typical 4-grams were noun phrases; and as for the second objective, the sequences examined mostly concerned the research conducted and the authorship of the texts. All in all, the 4-grams identified were structures that were specific to the genre under study but could also be used in other domains.
... (accessed 12 July 2021). (Biel 2018). Finally, based on the first two steps of corpus analysis, the authors further explain the discursive practices in the broader context of the U.S. culture and society. ...
Article
Full-text available
This study uses a combinative method of quantitative and qualitative discourse analysis applying the discourse-historical approach (DHA), based on a self-built special corpus composed of the U.S. laws, policies and strategy documents that are directly related to critical information infrastructure protection (CIIP); through a Word List ranked by frequency, it is found that there seems to be a coherent securitizing system which has been formed in the U.S. CIIP legislative practices, with some specific considerations in the CIIP policy-making process including the strategy for risk management. By further investigating the internal institutional relationships and institutional mechanisms with corpus tools, four discursive features and strategies of the U.S. CIIP institutional discourse can be discovered: the leading role of private and specific institutions in public-private cooperation; the coexisting characteristics of generality and precision in the process of object definition; the center-divergent institutional settings in executing CIIP execution; and the coordinating discourse patterns for CIIP within the U.S. legislation. Those discursive practices concerning different institutional actors can be further explained in the broader context of the U.S. social reality. This study is not only helpful in better understanding the legal practices in U.S. cybersecurity, but also provides some meaningful insights on CIIP legislation to policy makers in other countries as well as at the international level.
... One of these formulaic structures in linguistics has been studied as so-called lexical bundles [13], such as 'on the other hand', 'as can be seen' or 'it is recommended that'. Researchers focus predominantly on more or less specialized discourses: academic discourse [14], medical leaflets [15] and legal texts, the latter from perspectives such as genre [16], linguistic structure [6], translation strategies [17] or legal semantics [18]. ...
Article
Full-text available
The paper follows the tradition of research in legal linguistics and into formulaic language, specifically into lexical bundles. The aim of the paper is to describe lexical bundles in samples from the corpus of Slovak judicial decisions OD-JUSTICE by means of quantitative characteristics of the identified bundles and by their comparison with bundles found in two other specialized corpora: the corpus of Slovak legal regulations and the corpus of annual reports by Slovak public institutions. For the identification of bundles, the concept of the h-point was used. Identified bundles are described with respect to their maximal, minimal, average, median and mode values, distributions and ratios. The aim of the paper is to outline an interpretation of these bundle characteristics with regard to communicative function(s) of compared document genres.
... [11,89]), it is not surprising that lexical bundles have been also used to investigate the impact of translation process on the patterning of legal language. A study reported in Biel [12] explores internal variation in legislation relying on different legal corpora: a bilingual corpus of translator-mediated EU legislation (regulations and directives) -the Polish Eurolect corpus, compared against two reference corpora: the English Eurolect corpus and the corpus of non-translated Polish legislation (Polish Domestic Law Corpus). This is an excellent example of how different types of corpora (multilingual vs monolingual; translator-mediated vs. comparable) can be utilized to "account for two fundamental relations of translations: the relation to source texts and the relation to non-translated target-language texts of a comparable genre" [12: p. 15]. ...
Article
Full-text available
There are many different ways in which modern Corpus Linguistics can be used to enrich and broaden our understanding of legal discourse. Based on the central principle of co-occurrence and co-selection in language construction, this paper reviews current applications of Corpus Linguistics in the area of legal discourse focusing on issues ranging from phraseology, variation in legal discourse, legal translation, register and genre perspectives on legal discourse, legal discourse in forensic contexts to evaluative language in judicial settings. It revisits the notion of ‘corpus’ and it highlights the relevance of various types of legal corpora and computer tools in legal linguistic research.
Chapter
Recent years have witnessed a diversification of paradigms in legal translation studies. However, few studies have focused on the characteristics of legal translation texts based on a large scale of translated data from a descriptive perspective. This chapter discusses the corpus approaches adopted in legal translation and the role that corpora play in top-down and bottom-up legal translation research. Drawing on the cases in a bilingual parallel corpus, we investigate the probabilistic tendencies in legal translation and provide a probabilistic explanation of legal translation equivalence inconsistencies, thereby shedding light on situational and contextual variations of linguistic usage in legal translation. Our findings show that the legal translation equivalents tend to follow Zipf’s law and converge on effortless linguistic forms in the target language. We suggest that the emergence of descriptive, bottom-up methodologies in legal transition can help to explain translated legal text patterns and translation behaviors and contribute to our understanding of top-down legal translation norms and strategies.
Article
While the plain language movement has shed light on the lack of readability of statutory texts for the lay person, there has been a lack of empirical methodology employed to determine the ways in which statutory language differs lexico-grammatically from forms of popular language that are familiar to the lay person. With this in mind, the present study conducts a comparative analysis of statutory language and other forms of popular written language (i.e., a corpus of news reports, sports reports, encyclopedia articles, and historical articles) with two goals: 1) to provide a detailed lexico-grammatical description of statutory law independent from other forms of legal writing, and 2) to identify pervasive lexico-grammatical features of statutory language that the lay person has relatively less exposure to in comparison to other written registers. Following a bottom-up selection of lexico-grammatical features for analysis, a key feature analysis is used to identify linguistic features that are more pervasive in statutory law relative to other forms of popular written language as measured through Cohen’s d effect sizes. Results reveal the pervasive use of the passive voice, prepositions, a variety of coordinating conjunctions, the pied-piping wh-relative clause construction, and non-finite -ing and -ed clause constructions in statutory language. These results complement previous research regarding the features that are characteristic of statutory language and help to identify features that potentially contribute to the lack of readability of statutory law.
Book
Full-text available
The book explores selected problems relevant to the creation of a dictionary of legal English collocations, and documents the process of developing a lexicographic description of the collocational behaviour of 100 common nouns used in legal English.
Article
Full-text available
We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of-domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.
Article
Evidence from the contemporary translation services market and many centuries of translation practice demonstrate that translation into a non-native language (L2 translation) can be performed effectively, despite the once-strong resistance to it on the grounds of it being perceived as unprofessional and inherently deficient. L2 translation is in fact unavoidable in the case of so-called languages of low diffusion, the command of which happens to be rather limited among native speakers of major languages. However, although the academic dispute about the validity of L2 translation seems decidedly milder now, there is still a lacuna within L2 translator training that needs to be addressed. This paper indicates that what usually betrays an L2 translation is its phraseological profile, often recognised as unnatural by native speakers of the target language. The aim of this paper is to propose a corpus-based data mining technique that may help L2 legal translator trainees become more observant with regards to phraseological patterning of foreign legal discourse, and more self-confident in taking well-informed translation decisions.
Article
Full-text available
The paper explores the hypothesis that a large proportion of non-terminological word combinations in legislation is built around complex prepositions, which significantly contribute to phraseological profiles of legislative genres. The paper analyses the distribution and functions of complex prepositions in multilingual EU law and national law, on a comparative (cross-system-ic) and contrastive English-Polish basis, against the background of general language. The analysis is conducted in the corpus-based methodology with the corpora of EU legislation (JRC Acquis) – regulations and directives, national legislation of the UK (BoLC) and of Poland (PLC), and general corpora (BNC and NKJP). The findings confirm that complex prepositions are very frequent and hence cognitively salient in the genre of legislation: complex prepositions show increased distribution against general language, in particular in Polish. It is demonstrated that national legislation and EU legislation (translationese) are profiled by different sets of salient prepositions, which may adversely affect the readability of the latter due to interference. Functionally, it has been demonstrated that the phraseological profiles of legislative instruments are marked by complex prepositions used predominantly in referencing patterns (authority, conflict), conditionals, anchoring (framing) patterns, defining patterns and time deixis.
Article
Full-text available
Law is characterized by formalism especially in institutional contexts, and legal texts produced by institutional authors tend to be formulaic in nature. Despite the fact that formulaic language is a feature frequently encountered in legal genres, in legal and linguistic research it remains an underexplored phenomenon. Apart from Latin phrases derived from Roman law, the role and importance of phraseology in legal language is rarely discussed by legal professionals. Yet in the process of legal translation, conducted by legal comparatists and legal translators, phraseological patterns can form a major obstacle not only to understanding foreign law, but also to creating high quality legal translations. With regard to continental legal systems and German legal language in particular, this article examines the phenomenon of formulaicity in legal language and discusses the dependency of formulaic texts and legal phrasemes on legislation. Resumo. O Direito é caracterizado pelo seu formalismo, sobretudo em contex-tos institucionais, e os textos jurídicos produzidos por autores institucionais ten-dem a possuir uma natureza estereotípica. Não obstante o facto de a linguagem estereotípica constituir uma característica frequente dos géneros jurídicos, per-manece um fenómeno relativamente pouco estudado na pesquisa em linguagem e direito. À exceção das expressões provenientes do Latim, decorrente do Direito Romano, o papel e a importância da fraseologia na linguagem jurídica são rara-mente discutidos pelos proossionais do Direito. Contudo, no processo da tradução jurídica, realizada por especialistas em Direito Comparado e por tradutores ju-rídicos, os padrões fraseológicos podem constituir um grande obstáculo, não só à compreensão da legislação estrangeira, mas também à criação de traduções ju-rídicas de alta qualidade. Tendo como base os sistemas jurídicos do Continente europeu, em geral, e a linguagem jurídica alemã, em particular, este artigo anal-isa o fenómeno da esteriotipicidade na linguagem jurídica e discute a dependência dos textos estereotípicos e da fraseologia jurídica da legislação. Palavras-chave: Texto jurídico, estereotipicidade, frasemas jurídicos, tradução jurídica.
Chapter
Full-text available
Mixed corpus design for researching the Eurolect: a genre-based comparable-parallel corpus in the PL EUROLECT project Mieszana struktura korpusu do badania eurolektu – gatunkowy korpus porównawczo-równoległy w ramach projektu PL EUROLECT Streszczenie W artykule opisano mieszaną strukturę gatunkowego kor-pusu porównawczo-równoległego budowanego w ramach projektu PL EUROLECT finansowanego przez NCN (grant SONATA BIS, 2015-2018). Celem projektu jest kompleksowe zbadanie polskiego eurolektu, nowej hybrydowej odmiany języka polskiego powstają-cej w wyniku tłumaczenia i stosowanej w kontekście unijnym oraz dogłębne zrozumienie procesów i czynników go kształtujących, a także jego wpływu na poakcesyjną polszczyznę urzędową. Podstawą korpusu będzie struktura gatunkowa obejmująca cztery gatunki uznane za reprezentatywne dla komunikacji unijnej (akty prawne, orzeczenia, sprawozdania i urzędowe strony internetowe dla oby-wateli) podzielone na podgatunki – np. w ramach korpusu aktów prawnych wydzielone zostaną podkorpusy rozporządzeń, dyrektyw i decyzji. Struktura gatunkowa korpusu umożliwi zbadanie zróżni-cowania wewnętrznego eurolektu i uzyskanie bardziej precyzyjnych danych ilościowych. Na strukturę gatunkową zostanie nałożony dwu-języczny korpus równoległy zawierający wyrównane teksty w języku angielskim i polskim oraz jednojęzyczny korpus porównawczy zawie-rający nieprzetłumaczone teksty administracyjne w języku polskim, a także – jako punkt odniesienia – zrównoważona próba Narodowego Korpusu Języka Polskiego. Mieszana struktura korpusu ma umożli-wić badanie dwóch fundamentalnych relacji, tj. ekwiwalencji – relacji eurolektu do tekstów źródłowych (korpus równoległy) oraz dopaso-wania tekstowego – relacji eurolektu do nieprzetłumaczonych tekstów w języku docelowym (korpus porównawczy). W strukturze korpusu uwzględniony zostanie również korpus diachroniczny polszczyzny urzędowej sporządzony dla poszczególnych gatunków z okresu prze-dakcesyjnego i poakcesyjnego w celu zbadania wpływu eurolektu na urzędową odmianę języka polskiego. Uzyskane dane ilościowe będą 198 Łucja Biel rejestrować stan eurolektu i polszczyzny w przekroju gatunkowym w konkretnych przedziałach czasowych, i stanowić punkt odniesienia dla innych badaczy. Gatunkowe dane ilościowe otrzymane z analizy korpusowej zostaną poddane triangulacji z danymi jakościowymi (analiza dyskursu, semiotyka społeczna, badania prawnoporównaw-cze terminologii). Celem metodologicznym jest opracowanie inter-dyscyplinarnego modelu teoretycznego do badania odmian języka powstających z udziałem tłumaczy.
Article
Full-text available
Focusing on the exploration of intra-disciplinary register variation in the pharmaceutical domain, this corpus-driven study attempts to describe the use, composition and discourse functions of phrase frames, that is, contiguous sequences of words identical except for one (Fletcher, 2002-2007), found in samples of four English pharmaceutical text types, such as patient information leaflets, summaries of product characteristics, clinical trial protocols and chapters/sections from academic textbooks on pharmacology. The study deals with a specific sub-type of phrase frames, that is, 4-word units with a variable slot in the medial position, e.g. be * with caution, to take * medicine. The results showed, among others, that the use and discourse functions of phrase frames vary across pharmaceutical text types, that the correlation between the frequency of phrase frames and their pattern variability may depend on a register or genre, and that it is justified to treat the discourse functions of phrase frames as distinct from those of their textual variants. The paper is available at: https://www.degruyter.com/view/j/rela.2015.13.issue-3/rela-2015-0025/rela-2015-0025.xml?format=INT
Chapter
Corpus linguistics is a research approach that has developed over the past few decades to support empirical investigations of language variation and use, resulting in research findings which have much greater generalizability and validity than would otherwise be feasible. Corpus studies have used two major research approaches: 'corpus-based' and 'corpus-driven'. Corpus-based research assumes the validity of linguistic forms and structures derived from linguistic theory. The primary goal of research is to analyse the systematic patterns of variation and use for those pre-defined linguistic features. Corpus-driven research is more inductive, so that the linguistic constructs themselves emerge from analysis of a corpus. This chapter illustrates the kinds of analyses and perspectives on language use possible from both corpus-based and corpus-driven approaches. © 2010 editorial matter and organization Bernd Heine and Heiko Narrog. All rights reserved.