- Access to this full-text is provided by De Gruyter.
Download available
Content available from Linguistics
This content is subject to copyright. Terms and conditions apply.
Review
Natalia Levshina*, Savithry Namboodiripad*,
Marc Allassonnière-Tang, Mathew Kramer, Luigi Talamo,
Annemarie Verkerk, Sasha Wilmoth, Gabriela Garrido Rodriguez,
Timothy Michael Gupton, Evan Kidd, Zoey Liu, Chiara Naccarato,
Rachel Nordlinger, Anastasia Panova and Natalia Stoynova
Why we need a gradient approach to word
order
https://doi.org/10.1515/ling-2021-0098
Received May 13, 2021; accepted April 9, 2022; published online April 25, 2023
*Corresponding authors: Natalia Levshina, Max Planck Institute for Psycholinguistics, P.O. Box 310,
6500 AH Nijmegen, The Netherlands, E-mail: natalevs@gmail.com; and Savithry Namboodiripad,
Department of Linguistics, University of Michigan, 409 Lorch Hall, 611 Tappan Street, Ann Arbor, MI 48109-
1220, USA, E-mail: savithry@umich.edu. https://orcid.org/0000-0002-7685-5895
Marc Allassonnière-Tang, MNHN/CNRS/Université Paris Cité, Paris, France,
E-mail: marc.allassonniere-tang@mnhn.fr. https://orcid.org/0000-0002-9057-642X
Mathew Kramer, University of Michigan, Ann Arbor, MI, USA, E-mail: arkram@umich.edu.
https://orcid.org/0000-0002-0509-1453
Luigi Talamo and Annemarie Verkerk, Saarland University, Saarbrücken, Germany,
E-mail: luigi.talamo@uni-saarland.de (L. Talamo), annemarie.verkerk@uni-saarland.de (A. Verkerk).
https://orcid.org/0009-0009-4640-3052 (L. Talamo). https://orcid.org/0000-0002-3351-8362 (A. Verkerk)
Sasha Wilmoth and Rachel Nordlinger, University of Melbourne, Melbourne, Australia,
E-mail: sasha.wilmoth@unimelb.edu.au (S. Wilmoth), racheln@unimelb.edu.au (R. Nordlinger). https://
orcid.org/0000-0002-6626-9104 (S. Wilmoth). https://orcid.org/0000-0003-4126-8022 (R. Nordlinger)
Gabriela Garrido Rodriguez, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands,
E-mail: Gabriela.Garrido@mpi.nl
Timothy Michael Gupton, University of Georgia, Athens, GA, USA, E-mail: gupton1@uga.edu.
https://orcid.org/0000-0003-4970-823X
Evan Kidd, MPI for Psycholinguistics, Nijmegen, The Netherlands; The Australian National University,
Canberra, Australia; and ARC Centre of Excellence for the Dynamics of Language, Canberra, Australia,
E-mail: Evan.Kidd@mpi.nl. https://orcid.org/0000-0003-4301-2290
Zoey Liu, University of Florida, Gainesville, FL, USA, E-mail: liu.ying@ufl.edu
Chiara Naccarato, Anastasia Panova and Natalia Stoynova, National Research University Higher
School of Economics, Moscow, Russia, E-mail: ch1naccarato@gmail.com (C. Naccarato),
anastasia.b.panova@gmail.com (A. Panova), stoynova@yandex.ru (N. Stoynova). https://orcid.org/0000-
0003-0017-6316 (C. Naccarato). https://orcid.org/0000-0003-0793-671X (A. Panova). https://orcid.org/
0000-0001-8979-3788 (N. Stoynova)
Linguistics 2023; 61(4): 825–883
Open Access. © 2023 the author(s), published by De Gruyter. This work is licensed under the
Creative Commons Attribution 4.0 International License.
Abstract: This article argues for a gradient approach to word order, which treats
word order preferences, both within and across languages, as a continuous variable.
Word order variability should be regarded as a basic assumption, rather than as
something exceptional. Although this approach follows naturally from the emer-
gentist usage-based view of language, we argue that it can be beneficial for all
frameworks and linguistic domains, including language acquisition, processing, ty-
pology, language contact, language evolution and change, and formal approaches.
Gradient approaches have been very fruitful in some domains, such as language
processing, but their potential is not fully realized yet. This may be due to practical
reasons. We discuss the most pressing methodological challenges in corpus-based
and experimental research of word order and propose some practical solutions.
Keywords: continuous variables; entropy; gradience; typology; variability; word
order
1 Aims of this article
1.1 What do we mean by a gradient approach?
In this article we argue for a gradient approach to word order. By advocating for a gradient
approach, we put forth two main theoretical stances.First,weargueforthepresumption
of variability in word order research, or for treating variability as the null hypothesis.
Second, from a crosslinguistic perspective, we argue for a presumption of gradience; by
default, we expect that languages should vary in degree but not kind when it comes to
word order variability. From the perspective of description, a gradient approach means
that word order patterns should be treated as a continuous variable. For example, in
addition to –or instead of –labeling a language as SO (with the dominant Subject-Object
order) or OS (with the dominant Object-Subjectorder),wecancomputeandreportthe
proportion of SO and OS based on behavioral data (from sources such as corpora or
experiments). Similarly, instead of or in addition to labeling languages as having fixed,
flexible, or free word order, we can measure the degrees of this variability by using
quantitative measures, such as entropy, in a crosslinguistically comparable manner. Along
with increasing descriptive adequacy, this allows us to move beyond categorical claims
which stipulate that rigid word order provides a cue for assigning grammatical functions,
such as Subject and Object, to noun phrases; instead, we can measure the reliability and
strength of word order as a cue for argument structure based on behavioral data.
1
1Note that our claims apply to the order of words and larger-than-word units, as well, such as
syntactic constituents and clauses; in this paper, we use “word”as a general term, and specify
constituents/clauses when relevant.
826 Levshina et al.
A gradient approach to the description of a particular language can be illustrated
by a simple analogy: Instead of using categorical color terms like “white”,“blue”or
“orange”, we can encode them numerically by using different codes. For example, we
can talk about a color for which there is a word in English, “orange”, in RBG terms as
being 100% red, 64.7% green, and 0% blue. In CMYK color space, it is 0% cyan, 35.3%
magenta, 100% yellow and 0% black, and it is assigned a hex code #ffa500. However,
an additional advantage of this continuous measure is that we can also describe a
color for which there is no standard label, such as #BA55D3, which is 73% red, 33%
green, and 83% blue, and is given the label “medium orchid”in the X11 color names
system. Thus, using various continuous measures, we can not only account for the
underlyingly gradient properties of color, but we can also describe more and less
prototypical colors on equal footing. Likewise, we can use cross-linguistically com-
parable gradient measures to describe a particular language, to describe how lan-
guages vary, and as a basis for understanding how linguistic variables may interact
to produce typological patterns and motivate change.
1.2 Theoretical and practical concerns with gradience
Despite some previous research problematizing categorical distinctions in linguistics
(e.g., Wälchli 2009), non-gradient approaches to word order have been traditionally
dominant in cross-linguistic studies (e.g., Dryer 1992, 2013a, 2013b; Greenberg 1963;
Lehmann 1973; Vennemann 1974). In contemporary quantitative typology and
Bayesian phylogenetic models of language evolution, word order is still usually treated
categorically (Cathcart et al. 2018; Dunn et al. 2011; Greenhill et al. 2017; Hammarström
2015; Jäger and Wahle 2021; Maurits and Griffiths 2014). Popular large-scale cross-
linguistic databases, such as WALS (Dryer and Haspelmath 2013), AUTOTYP (Bickel et
al. 2022), Grambank
2
and DiACL (Carling 2017), provide categorical word-order data.
There are both theoretical and practical reasons for this situation. From a
theoretical perspective, a widespread view has been that grammar in general and
word order in particular should be described by discrete features, categories, rules,
or parameters (e.g., Guardiano and Longobardi 2005; Jackendoff1977; Lightfoot 1982;
Vennemann and Harlow 1977), although probabilistic approaches to grammar are
becoming increasingly influential (Bod et al. 2003; Bresnan et al. 2007; Grafmiller
et al. 2018; also see Section 1.2.2). A significant practical barrier to adopting a gradient
perspective is that it is still difficult to obtain the types of behavioral or corpus data in
many languages which would be necessary for descriptions using gradient mea-
sures. However, these barriers are rapidly falling away; we have access to new data
2www.glottobank.org (accessed 1 March 2022).
A gradient approach to word order 827
sources in the form of large corpora, as well as software for processing and analyzing
them statistically. To take one example, the Universal Dependencies project (UD;
Zeman et al. 2020) had 10 treebanks for 10 languages in 2015. Version 2.9, released in
November 2021, contained 217 treebanks for 122 languages, with more in the works.
Computational algorithms, such as multilingual word alignment applied to massively
parallel corpora, can be used to scale up gradient approaches, providing word order
information for almost a thousand languages (Östling 2015). The proliferation of
quantitative approaches to different areas of linguistics (Janda 2013; Kortmann 2020),
which has been possible thanks to the development of user-friendly statistical soft-
ware and robust experimental methods, has made us better equipped than ever for
investigating gradient phenomena, though many practical issues are far from being
fully solved, as we show in Section 4.
1.2.1 Gradience across generative approaches
Even in approaches where, historically, gradience has not been a central concern,
various strategies have emerged to account for variability in word and constituent
order. With the recent diversification of methodologies in the generative paradigm,
researchers are recognizing the need to account for gradience and variability in theory
building.
3
This section first discusses so-called ‘mainstream’generative approaches (cf.
Culicover and Jackendoff2006) and accounts therein of gradience and variability in
order. Then, we look to other major generative approaches, namely Lexical-Functional
Grammar and Head-Driven Phrase Structure Grammar, in which accounting for
variation in word/constituent order has played a central role in theory construction.
We review these here as part of demonstrating why gradience might not have been as
central to previous accounts of word order flexibility; the theoretical moves we
advocate for in this article are certainly compatible with a range of approaches to
syntax, though the details of implementation and, perhaps most crucially, how
interested practitioners might be in the associated questions, will of course vary.
Historically, gradience in production or speaker judgments was explained as a
difference among grammars: in short, crosslinguistic variation related to proposed
parametric differences (e.g., Chomsky 1986). Although the Minimalist Program’s
feature-checking model generates canonical word orders, subject position, as well as
derived word orders in a step-wise, phase-basedderivation (though cf. Fanselow 2009),
optionality and gradience are problematic and often not assumed to exist.
4
There is
3For an experimental perspective, see, e.g., Leal and Gupton (2021). For a review of the micro-
parametric approach and its application to diachronic studies, see, e.g., Roberts (2019).
4For the Minimalist Program, see Chomsky (1995, 2001, 2008). See Preminger (2017) for arguments in
favor of a feature-valuation model over a feature-checking model.
828 Levshina et al.
work that aims to account for insights related to frequency and innovation in inter-
generational language transmission and acquisition (e.g., Biberauer 2019; Gravely
2021; Yang 2002) or from variationist sociolinguistics (e.g., Adger and Smith 2005, 2010)
by combining the assumptions and theoretical machinery of the Minimalist Program
with probabilistic analyses of variable word order. In this type ofwork, variability can
be accounted for by gradient or stochastic probabilistic constraints on lexical/feature
selection, motivated by a variety of factors, including functional ones. However,
because the organizing principle of this framework is to reduce the specificity of
syntactic operations, the types of processes which are relevant to account for different
orders within languages or varieties of languages, for example, are often explicitly
constructed as being extra- or post-syntactic operations –a theoretical conundrum,
given that such processes are reflected in the syntax itself. As such, analyses which
include variation or gradience are necessarily about interfaces (as this is a modular
framework) with other levels of linguistic analysis.
Syntactic variation related to discourse-information structure is one particularly
relevant example, and it presents one of the biggest challenges for the Y-model of
language (e.g., Irurtzun 2009). Although no crosslinguistic model accounting for
variability due to information structure currently exists in this framework,
numerous attempts have centered on particular phenomena or languages.
5
Exam-
ples are Rizzi’s (1997) Cartographic Program, Frascarelli and Hinterhölzl’s (2007)
isomorphism of syntax, information structure, and intonation, and Zubizarreta
(1998), who elegantly accounts for syntax, prosody, and information structure in
Germanic and Romance without the split-CP architecture. However, they are not
unproblematic. In particular, recent experimental tests of Spanish varieties find
variation not predicted by Zubizarreta’s (1998) account, especially for subject in-
formation focus (i.e., rheme).
6
In fact, Feldhausen and Vanrell (2015), which builds
upon Zubizarreta’s account, does so using a non-categorical approach, namely Sto-
chastic Optimality Theory (Boersma and Hayes 2001).
Indeed, Optimality Theoretic approaches have been a productive space for
generative syntacticians interested in making variation (be it gradient or stochastic)
more central to syntactic analyses. Ortega-Santos (2016) is a compelling application of
an Optimality Theoretic approach to focus-related word order variation in Spanish.
Müller (2019) uses a Gradient Harmonic Grammar (Smolensky and Goldrick 2016)
approach to shed light on extraposed infinitive constructions in German, showing
5See, e.g., Culicover and Rochemont (1983) on focus and stress in English, Horvath (1986) on focus in
Hungarian, and López (2009) on a model for Catalan and Spanish which also integrates information
structure, and Williams (1977) on VP anaphora in English.
6See, e.g., Gabriel (2010) for Argentine Spanish; Hoot (2016) for Northern Mexican Spanish; Jiménez-
Fernández (2015) for Andalusian Spanish; Leal et al. (2018) for Chilean and Mexican Spanish, as well
as Spanish-English bilinguals living in the US; Muntendam (2013) for Andean Spanish.
A gradient approach to word order 829
how this approach can explain apparent variable strength in the CP realm. Since
Syntactic Structures (Chomsky 1957), the vast majority of generative approaches have
sought descriptive as well as explanatory adequacy; therefore, not assuming or
accounting for gradience means not capturing an important portion of speaker
competence when it comes to word order. The result is a theoretical model that is, at
best, incomplete, and, at worst, incorrect.
The question of how to deal with languages with flexible word order has played
an important role in how non-hegemonic generative approaches to syntax diverged
from other generative approaches. For example, Lexical-Functional Grammar (LFG)
is characterized by a division between constituent structure and functional struc-
ture, and this has been used specifically to deal with languages with highly flexible
constituent order (Austin and Bresnan 1996; Dalrymple et al. 2019). In LFG, languages
are not a priori required to be specified for particular orders, and languages such as
Plains Cree or Malayalam are argued to be non-configurational (Dahlstrom 1986;
Mohanan 1983; see also Nordlinger 1998 and Simpson 2012 for discussion of non-
configurational Australian languages). Accounting for flexible constituent order
languages was not as central to the development of Head-Driven Phrase Structure
Grammar (HPSG), in which linearization of constituents is similarly separate from
constituent structure (Wechsler and Asudeh 2021), but, as HPSG has been imple-
mented in a variety of languages with flexible constituent order, the framework is
flexible enough to account for the gradience seen in natural language (e.g., Fujinami
1996 for Japanese; Mahmud and Khan 2007 for Bangla; Müller 1999 for German;
Simov et al. 2004 for Bulgarian). While gradience is not central to these approaches,
the existence of flexibility in constituent order, and, relatedly, whether and where
grammatical relations are specified in formalisms and/or speakers’knowledge,
7
has
not only been addressed in these frameworks, but dealing with word order flexibility
has been the source of direct cross-framework comparison. What remains under-
developed in these accounts is the ability to explain and measure degree of flexibility.
1.2.2 Gradience in usage-based linguistics
Though all theoretical approaches must deal with it in some way, word order gra-
dience follows naturally from approaches that assume a dynamic usage-based view
of grammar (e.g., Bybee 2010; Diessel 2019). This can be illustrated by grammatical-
ization processes. For example, English is mostly a prepositional language. Since
many prepositions represent a result of the grammaticalization of verbs, and English
has predominantly VO order, the normal outcome of this process is the development
7For a more recent discussion of this general issue which compares formal to cognitive-functional
perspectives, see Johnston (2019), who argues that grammatical relations are not specified in AusLan.
830 Levshina et al.
of prepositions. However, English also has a few postpositions, such as ago and
notwithstanding. Nouns that are now used with those postpositions were originally
the subjects of the source verbs (Dryer 2019). The bottom-up emergence of grammar
from language use thus is very likely to result in variability/construction-specific
idiosyncrasies, such that a categorical approach of applying labels to languages as a
whole is less accurate than a gradient approach, which can quantify degrees of
divergence from some canonical extreme.
The usage-based approach assumes that the user’s knowledge of grammar is
probabilistic (Bod et al. 2003; Grafmiller et al. 2018) because it is derived and updated
based on exemplars of usage events stored in the memory (Bybee 2010), from which
more abstract generalizations can be formed (Goldberg 2006). Individual language
users implicitly learn statistical variability, or “soft constraints”(Bresnan et al. 2001),
from the input (MacDonald 2013). This view is supported by the fact that language
users are able to predict the likelihood of a variant in a specific context, closely
matching predictions based on corpora (Bresnan and Ford 2010; Klavan and Divjak
2016). The probabilistic variation is captured by complex statistical models of lan-
guage users’behavior (e.g., Bresnan et al. 2007; Gries 2003; Szmrecsanyi et al. 2016, to
name just a few). Gradience thus forms a part of mental grammar, and categories are
emergent or epiphenomenal.
Both word order variability and rigidity in such a framework result from
competition between factors involved in language processing and learning, which
can have different weights in one language and across languages (see Section 2.1). In
this sense, the probabilistic view has much in common with Optimality Theory (see
Lee 2003; Keller 2006, et alia for work which deals directly with word order).
8
The
difference is that probabilistic grammars do not assume a fixed set of innate con-
straints (Grafmiller et al. 2018). This variability is instead represented in the indi-
vidual speaker’s grammar in the form of sequential associations of different
strength, depending on the degree of entrenchment of a sequence in an individual
mind (Diessel 2019), which both depends on and determines the degree of conven-
tionalization of this sequence in the community (Schmid 2020).
1.3 Moving from categories to gradience in language description
In this article, we argue that a gradient approach is more descriptively adequate than
the approaches which rely on categorical labels like “VO”and “OV”,or“rigid/fixed”,
“flexible”and “dominant”. In particular, rigid order means that some orders are
8See also the Competing Grammars Framework, Kroch (1989), formalized in Kauhanen and
Walkden (2018).
A gradient approach to word order 831
“either ungrammatical or used relatively infrequently and only in special pragmatic
contexts”(Dryer 2013b). If different possible word orders are grammatical, languages
have flexible order. Many flexible-order languages are claimed to have a dominant
word order, i.e., the more frequent one (e.g., the order that is used at least twice as
frequently as any other order is considered dominant); in the absence of frequency
data, or in languages where the dominant order might differ based on contexts such
as register or genre (see Payne 1992), this could also be the order labeled as “basic”,
“pragmatically neutral”or “canonical”in language descriptions.
However, ashas been noted (e.g., Hale 1983; Salzmann 2004), a clear-cut distinction
between rigid-order languages and flexible-order languages with a dominant order is
problematic. First of all, pragmatic neutrality is a slippery notion that depends on the
specific situation. In spontaneous informal conversations in Russian, for example,
there is nothing pragmatically marked about putting a newsworthy object first. For
example, (1a), which comes from a transcribed spoken text, has OSV. Compare it with
(1b) taken from an online news report, which has SVO,or given –new order, which is
pragmatically neutral here. See more on register and modality effects in Section 4.1.3.
(1) Russian
a. Spontaneous conversation
Čajnik ja postavi-l-a.
kettle.ACC 1SG.NOM set.PFV-PST-SG.F
‘I set the kettle.’
(Zemskaja and Kapanadze 1978: “A day in a family”)
b. Online news
Ona postavi-l-a čajnik na plit-u…
3SG.F.ACC set.PFV-PST-SG.F kettle.ACC on stove-ACC
‘She set the kettle on the stove…(and forgot about it)’
(mir24.tv)
The second criterion, text frequency, is also problematic. Converting a continuous
measure to a categorical one means loss of data, and there is not always a clear cut-off
point. Consider an illustration. Using corpora of online news in the Leipzig Corpora
Collection (Goldhahn et al. 2012) in 31 languages, annotated with the Universal De-
pendencies (Zeman et al. 2020), we computed the proportions of Subject –Object
order in the total number of clauses in which both arguments were expressed by
common nouns. The sample sizes were very large, with a median number of about
138,000 clauses. The data are available online as Dataset1.txt in an OSF directory.
9
The result is shown in Figure 1. In all 31 languages, Subject usually comes before
Object. The distribution of the scores represents a continuum from flexible languages
9https://osf.io/6m7ec/?view_only=1eb102ddd91d41dcaeaa0f266e5eab79.
832 Levshina et al.
(Lithuanian, Hungarian, Tamil, Latvian, and Czech) to well-known rigid languages
(e.g., Indonesian, French, English, Norwegian, Danish, and Swedish). This plot
demonstrates that there is no clear boundary between the two types, so it is difficult
to choose the most appropriate cut-offpoint.
Figure 1: Proportions of the Subject first, Object second order in sentences with nominal Subject and
Object in online news corpora. The lines delimit the 95% confidence intervals. Due to large samples, they
are very narrow.
A gradient approach to word order 833
Approaches such as Dryer (2013a), which includes languages without a dominant
order, resulting in a simple three-way scale (e.g., Adjective –Noun, no dominant
order, Noun –Adjective), and Siewierska (1998), which determines relative flexibility
based on the number of grammatical orders in a language, can be seen as precursors
of gradient approaches. Here, we advocate for taking the next logical step and
moving to more fully continuous variables, as these are not only descriptively more
adequate, they also allow us to ask and answer more questions about word order, as
discussed in more detail in Section 3. A gradient approach may not always lead to
dramatically different results from a categorical one. For example, word order
correlations, e.g., the correlation between the order of verb and object, and the order
of adpositions and nouns, would still be correct under the gradient approach because
both orders display low variability in many languages (Levshina 2019; see also Sec-
tion 3.5). Yet, we can investigate more linguistic phenomena and languages if we
characterize word order with the help of continuous measures.
Note that we do not argue for completely banishing descriptive labels like
“canonically SVO language.”Such labels can still be useful as a shortcut, especially in
the absence of more precise measures. However, under a gradient approach, we
make explicit that these labels reflect convenient simplifications rather than rep-
resenting inherent properties of particular languages. Reiterating our two main
arguments, presuming within-language variability and cross-linguistic gradience
when it comes to typological categorization, we posit that gradience itself might be a
good candidate for an inherent property of language.
In some cases, conventional categories can be misleading. For example, the plot
in Figure 2 displays the distribution of proportions of head-final phrases in 123
corpora annotated with Surface-syntactic Universal Dependencies (Osborne and
Gerdes 2019).
10
The dataset is available as Dataset2.txt in the OSF directory, which
also contains the Python script that was used for the data extraction. The plot shows
that the corpora do not follow a distribution with two peaks at each end, confirming
the results obtained by Liu (2010) based on a small sample of languages. In other
words, the corpora are not strongly head-initial and only rarely strongly head-final
(cf. Polinsky 2012). In fact, the main bulk of the corpora contains between 25% and
50% of head-final phrases. The conventional labels mask the asymmetric and
gradient shape of the distribution, which is best represented by continuous
measures.
Studying gradience requires interdisciplinarity. In order to understand why word
order exhibits more or less variability, we need to consider diverse factors related to
language acquisition, processing, language contact, language change, prosody, and many
10 The corpora can be downloaded from https://grew.fr/download/sud-treebanks-v2.8.tgz. Only the
training corpora were taken. Some languages were represented by more than one corpus.
834 Levshina et al.
others. These are discussed in Section 2. In Section 3, we discuss the research questions in
different linguistic domains that cannot be asked and answered without taking a
gradient approach. In order to study word order as a gradient phenomenon, we need
quantitative measures of variability, such as probabilities or entropy, and empirical data
from experiments or corpora. We suggest some methodological solutions and discuss the
main challenges in Section 4. Finally, Section 5 provides conclusions and poses pertinent
questions for future research across different linguistic disciplines.
2 What conditions word order gradience?
The aim of this section is to identify the main factors that contribute to the emergence
of gradient word order patterns, and which make it possible for those patterns to
remain in a language community. We begin with individual cognitive processes
leading to gradient patterns, reviewing a large body of experimental and corpus-
based work on language processing and acquisition (Sections 2.1 and 2.2). After that,
we move on to word order gradience at the level of communities, focusing on
Figure 2: Distribution of head-finalness in SUD corpora.
A gradient approach to word order 835
language change (Section 2.3) and the processes of language contact, which interact
with language variation in vernaculars (Section 2.4).
2.1 Language processing
There is a large body of psycholinguistic work that has investigated what conditions
word order variability at the level of an individual language user and learner. An
important role is played by the accessibility of information expressed by constitu-
ents, depending on their semantics, information status, and formal weight. It is
advantageous for the speaker/signer to place more accessible information first
because it helps to save time for planning the less accessible parts. Specifically, there
is a general preference to place human (and, more broadly, animate) referents in
early appearing and/or prominent sentence positions (i.e., to realize them as gram-
matical subjects) (Branigan et al. 2008; see also Meir et al. 2017, who found a “me first”
principle across three groups of sign languages and in elicited pantomime). The
explanation for these effects on word order concern conceptual accessibility: humans
find it easier to access concepts denoting animate entities along with their linguistic
labels from memory (Bock and Warren 1985). Accordingly, Tanaka et al. (2011) re-
ported that Japanese-speaking adults were more likely to recall OSV sentences as
having SOV word order when this resulted in an animate entity appearing before an
inanimate entity. Thus, sentence (2a) was often recalled as (2b).
(2) Japanese
a. minato de, booto-o ryoshi-ga hakonda
harbor in boat-ACC fisherman-NOM carried
‘In the harbor, the fisherman carried the boat.’
(Tanaka et al. 2011: 322)
b. minato de, ryoshi-ga booto-o hakonda
harbor in fisherman-NOM boat-ACC carried
‘In the harbor, the fisherman carried the boat.’
(Tanaka et al. 2011: 322)
This effect may ultimately have its origin in preferred modes of event construal.
Studies of scene perception and sentence planning using eye-tracking show that
speakers can rapidly extract information about participant roles in events, with
speakers fixating on characters and entities that are informative about the scene as a
whole (Konopka 2019).
Constituent order is also influenced by the discourse status of referents. The
earliest proposal for this principle was perhaps by Behaghel (1932: 4): “es stehen die
836 Levshina et al.
alten Begriffe vor den neuen”[old concepts come before new ones]. This general-
ization was then later captured as “given before new”by Gundel (1988). The effects of
discourse information status have been empirically attested in various contexts. For
example, the production experiments of Bock and Irwin (1980) showed that, for
English speakers, given information tends to be produced earlier. Similar results
have been replicated in Ferreira and Yoshita (2003) with Japanese. Arnold et al.
(2000) demonstrated that in both heavy NP shift and the dative alternation in English,
speakers prefer to produce the relatively newer constituent later. The influence of
information structure would in turn affect ordering flexibility, showing more fixed
preference for given information to appear first (for example, Israeli Sign Language
has been said to have a Topic-Comment order [Rosenstein 2001], see also McIntire
[1982] on American Sign Language). Although most languages discussed in that re-
gard have predominantly given-before-new order, some languages prefer to put new
and/or newsworthy information first, e.g., Biblical Hebrew, Cayuga (Iroquoian, the
USA), Ngandi (Gunwinyguan, Australia), and Uto (Uto-Atzecan, the USA) (Mithun
1992). This can be explained by the competing principle: “More important or urgent
information tends to be placed first in the string”(Givón 1991: 972). But manifesta-
tions of this principle can also be found in languages with given-first order, e.g., the
clause-initial placement of contrastive topic and focus, but also of full nominal
phrases, as in (1a). Crucially, these principles interact, and they are often strong
tendencies as opposed to absolute principles; competition between factors such as
“given first”and “important first”can lead to gradience effects within and across
languages.
Intonation plays an important role in these processes. Across languages, we see
that less frequent orders are associated with particular intonational patterns (e.g.,
Downing et al. 2004; Patil et al. 2008; Vainio and Järvikivi 2006; see also Büring 2013),
but also there seems to be a relationship between word order flexibility and the
degree to which information structure is encoded prosodically, via word order, or
both. Swerts et al. (2002) compared prominence and information status in Italian and
Dutch. Italian has relatively flexible word order within noun phrases as compared to
Dutch, and Swerts et al. found that, within noun phrases, Dutch speakers encoded
information status prosodically in production, and took advantage of this informa-
tion in perception. However, the connection between information status and pro-
sodic prominence was much less straightforward in Italian production, and it was
unclear whether Italian listeners were attending to prosodic prominence as a cue for
information status. This work suggests a trade-offbetween word order flexibility and
prosodic encoding of information structure.
A related factor influencing word order is “heaviness”(i.e., the length, usually in
words, of a constituent, especially relative to another in the same utterance). In the
already mentioned study, Arnold et al. (2000) also showed that heaviness is an
A gradient approach to word order 837
independent factor that determines constituent order in English, such that speakers
prefer to produce short and given phrases before long and new ones (e.g., I gave the
Prime Minister a signed copy of my latest magnum opus is preferred to I gave a signed
copy of my latest magnum opus to the Prime Minister). In contrast, Yamashita and
Chang (2001) showed that Japanese speakers prefer to order long phrases before
short ones, suggesting that the typological differences between the languages lead to
speakers weighting formal and conceptual cues in production differently (for a
connectionist model, see Chang 2009).
These heaviness effects and their cross-linguistic variation have been
explained by the pressure to minimize dependency lengths and similar processing
principles, including Early Immediate Constituents (EIC) (Hawkins 1994), Minimize
Domains (MiD) (Hawkins 2004), Dependency Locality Theory (DLT) (Gibson 1998)
and Dependency Length Minimization (DLM) (Ferrer-i-Cancho 2004; Temperley
and Gildea 2018).
11
These principles share a similar general prediction, which posits
that words or phrases that bear syntactic and/or semantic dependency/relations
with each other tend to occur closer to each other. For example, dependents of the
verb move to a position close to the verb. Empirical support and motivation for
these principles mainly come from language comprehension studies (Gibson 2000),
to the effect that shorter dependencies are preferred in order to ease processing
and efficient communication (Gibson et al. 2019; Hawkins 1994, 2004; Jaeger and
Tily 2011). These principles explain the above-mentioned preference “light before
heavy”in VO languages like English, and “heavy before light”in OV languages like
Japanese.
As revealed by corpus-based investigations, the effect of these principles varies
cross-linguistically. Previous work looking at syntactic dependencies has provided
evidence that languages minimize the overall or average dependency length to a
different extent (Futrell et al. 2015a, 2020; Gildea and Temperley 2010; Liu 2008; Liu
2020). The effects also depend on specific constructions. A study by Futrell et al. (2020)
showed further that the average dependency length given fixed sentence length is
longer in head-final contexts (see also Rajkumar et al. 2016). Other experiments have
investigated DLM in syntactic constructions with flexible constituent orders
(Gulordava et al. 2015; Liu 2019; Wasow and Arnold 2003; Yamashita 2002); with an
examination of the double PP construction across 34 languages, Liu (2020) demon-
strated a typological tendency for constituents of shorter length to appear closer to
their syntactic heads. The results also indicated that the extent of DLM is weaker in
preverbal (head-final) than postverbal (head-initial) domains, a contrast that is less
11 While EIC, MiD and DLT were all built upon a phrase structure framework, DLM was formulated
using the dependency grammar framework. Though EIC, MiD and DLT have not used the term
dependency length directly, the notion can be easily derived.
838 Levshina et al.
constrained by traditionally defined language types (Hawkins 1994); this contrast
also seems to hold when the effects of additional factors such as lexical frequency
and contextual predictability are controlled for (Liu 2022). Note that the patterns in
the preverbal context are in opposition to previous findings with transitive con-
structions in Japanese (Yamashita and Chang 2001) and Korean (Choi 2007).
Word order flexibility can be advantageous for language users by helping
to maximize fluency, permitting users to articulate easily retrievable elements
early, and leaving time to plan the more cognitively demanding elements. At the
same time, language users are inclined to re-use more global phrasal and sen-
tenceplans.Thisisobservedinrecycling of highly accessible abstract schemas
known as syntactic priming (Ferreira and Bock 2006). Rigid word order, which
allows language users to recycle the same routinized articulation plan, can also
facilitate language production (MacDonald 2013). The re-use of highly entrenched
production plans in conceptually similar constituents can also account for
analogy effects in word order (for instance, the order of genitive and noun
matches that of adjective and noun, cf. Diessel 2019). This kind of recycling is also
important for frequently co-occurring lexemes (e.g., Sasano and Okumura 2016).
Since the sequential associations between linguistic units can be of different
strength (Diessel 2019), we observe gradience in the individual speaker/signer’s
grammar knowledge and in their interaction with other users. Highly frequent
sequences have strong sequential associations due to automatization and
chunking (Bybee 2010; Diessel 2019). Frequency-driven grammaticalization is
usually accompanied by a loss of syntagmatic freedom (Lehmann 1995 [1982]). We
can speak therefore of a trade-offbetween the ease of production of individual
words and their sequences. Different languages and varieties have different
points of balance, leading to cross-linguistic gradient patterns. The differences
between languages can be explained by particular features of different gram-
mars: cross-linguistic studies reveal that grammatical features of individual
languages (e.g., verb agreement or aspectual marking) constrain language pro-
duction (e.g., Kimmelman 2012; Norcliffe et al. 2015; Nordlinger et al. 2022; Proske
2022; Sauppe et al. 2013), even at the earliest stages of sentence planning. Taking a
gradient approach to how these factors interact is crucial for capturing both the
distribution of cross-linguistic variation and the dynamics of within-language
variation.
Although research on comprehension and production regularly uses different
methods and tests mode-specific hypotheses, it is likely that each makes use of the
same underlying representational vocabulary and processes (Momma and Phillips
2018). Thus, all things being equal, we should expect many of the constraints on word
order gradience in production to determine expectations about word order vari-
ability in comprehension (see MacDonald 2013, and commentaries, for discussion of
A gradient approach to word order 839
one proposal along these lines), although exactly what those constraints are, how
they are implemented on parsing and production architectures, is still an open
question.
2.2 Language acquisition
The outcomes of learning vary at different points across the lifespan. By and large,
children faithfully replicate the stable distributional properties of their input
(Ambridge et al. 2015), thus maintaining the variability observed cross-linguistically
in word order patterns.
12
That is, an English-acquiring child will acquire the vari-
ability exhibited in English, and a Latvian-acquiring child will acquire the variability
exhibited in Latvian. Word order flexibility is in principle not problematic for
acquisition since children acquire so-called free-word order languages such as
Warlpiri without problem (e.g., see Bavin and Shopen 1989). All this means that the
processes of language development allow for the learning of gradient patterns and
support the maintenance of word order variability if it exists in the ambient
language.
As for adult L2 learners, experimental evidence shows that they have a strong
bias towards regularity during language diffusion (Smith and Wonnacott 2010). In
these experiments, input languages exhibiting free variation become increasingly
regular (Wonnacott and Newport 2005).
13
Also, it seems that adults are good at
learning abstract patterns that can be captured by a few simple rules, while children
are better at memorizing strings without any underlying rules (Nowak and Baggio
2017). Based on this, we can hypothesize that a high number of L2 adult learners may
result in lower variability in word order in the target language, although this needs to
be further investigated. Taking a gradient approach to word order flexibility would
allow for a more nuanced picture which could potentially incorporate more complex
contexts of language contact (see Section 2.4).
14
12 Although in the case of variable or incomplete input, word order preferences may emerge
spontaneously, such as in children learning homesign (Goldin-Meadow and Mylander 1983).
13 At the same time, adults tend to probability-match free variation in an input language under
certain conditions more than children do (Hudson Kam and Newport 2005), but this effect is
restricted by different factors. In particular, it is observed when adults reproduce already familiar
input and when the free variation is between only two alternatives.
14 Cf. related work by Bentz and Winter (2013) which shows a statistical tendency for reduced case
marking in languages with more L2 users.
840 Levshina et al.
2.3 Language change
While word order change may be caused by language contact (see Section 2.4, and
also Bradshaw 1982; Doğruöz and Backus 2007; Heine 2008, etc.), including substrate
influence, this section focuses on internal change, which is influenced by the
cognitive factors discussed above.
15
These changes are transferred and/or amplified
via language acquisition, which leads to language- or lineage-specific pathways in
word order change (cf. Dunn et al. 2011).
Hawkins (1983: 213) states that word order change takes place through
“doubling”: a new order comes in, co-exists with the old one, and replaces the old one
through increase in frequency of occurrence and grammaticalization. We can
represent this scenario informally as follows: WO
A
>WO
A
&WO
B
>WO
B
. This may
not be true for all word order change, but this process is a basic benchmark. Hence,
change in word order is inherently tied up with variation. For example, Bauer (2009)
describes word order change in Latin as part of a general tendency or “drift”(Sapir
1921) in Indo-European, with change away from rigid left-branching (OV) order in
Proto-Indo-European (Bauer 2009: Sect. 2.1), towards flexible right-branching (VO)
order in Latin, to more rigid right-branching order in Romance.
Importantly, this change happens first at the level of specific constructions, not
sweepingly across all constituents at once.
16
The doubling and word order variability
are a result of these local changes. Bauer (2009: Section 3) describes in detail the
constructions that changed from left-branching to right-branching order in their
consecutive order, with verb-complement order changing after right-branching
relative clauses, prepositions, and noun-adjective order emerged. Variation arose
through long-term development of moving away from left-branching structure,
where postpositions were archaic and prepositions arose, both were in non-free
variation (i.e., some lemmas were postpositional, others prepositional) for an
extended period of time. In addition, stylistic and pragmatic word-order variation
arose, such as fronting of the subject and the verb. Several of these pragmatic word
orders have been grammaticalized in Romance languages, most importantly using
the cleft construction for emphasis. All of these changes are interdependent and
dependent on the previous state of the language, i.e., Bauer (2009: 306) emphasizes
that variation is not arbitrary, but “in fact [is] rooted in the basic characteristics of
the language and [is] connected to many other linguistic features”.
15 Substrate influence versus internal change have been suggested for change to VSO in Celtic;
Pokorny (1964) argues for a substrate influence account, Watkins (1963) and Eska (1994) for a
language-internal account.
16 Though cf. Kroch (1989) and Taylor and Pintzuk (2012) who argue that English OV to VO change
happened in a categorical manner.
A gradient approach to word order 841
However, variation in word order does not always imply (ongoing) language
change. Word order variability can be stable for centuries. Examples are adjective-
noun and noun-adjective order in Romance and many other languages, such as in
Greater Philippine (Austronesian), where four of the six languages in Dryer (2013a)
show variable order and at least seven more languages have both orders
(Santo Domingo Bikol in Fincke [2002: 93–94]; Mandaya in Estrera [2020: 26–27];
Mansaka in Svelmoe and Svelmoe [1974: 51]; Kalagan in Collins [1970: 66–67];
Cebuano in Tanangkingsing [2009: 134]; Romblomanon in Law [1997: 19–20]; Inonhan
in van den Heuvel [1997: 39]). Other examples are verb-object order in main/sub-
ordinate clauses in Germanic and many other languages, such as Western Nilotic
(for Dinka and Nuer see Salaberri [2017], other relevant Western Nilotic languages
are Jumjum in Fadul et al. [2016: 39] and Reel in Cien et al. [2016: 88]), and co-existence
of prepositions and postpositions in Gbe and Kwa languages (Aboh 2010; Ameka 2003;
see in general Hagège 2010: 110–114). There are usually different diachronic sources
for each part of the doublet (Hagège 2010; Hawkins 1983), but this does not neces-
sarily imply that one word order will outcompete the other over the course of
generations. While word order change implies variability, variability does not
necessarily imply change.
On the other hand, language change can involve rigidification and loss of flex-
ibility of frequently used constructions. Croft (2003: 257–258), citing Lehmann (1995
[1982]) and Heine and Reh (1984), calls the grammaticalization of word order rigid-
ification, “the fixing of the position of an element which formerly was free”(see also
Lehmann 1992). See Hawkins (2019) for the relation between rigidification and
Sapir’s (1921) notion of drift, and Harris and Campbell (1995: Ch. 8) for a summary of
word order change.
2.4 Language contact and vernaculars
While borrowing of orders wholesale from one language to another is one potential
outcome, language contact can also have a non-obvious effect on word order vari-
ability. For example, Heine (2008) discusses change to flexibility as one potential
outcome of language contact. Contact may lead to increased or decreased variability.
This has been documented in a number of Australian Aboriginal languages, which
have famously flexible word order, after they came under pressure from the rela-
tively more rigid English or Kriol (e.g., Lee 1987; Meakins 2014; Meakins and O’Sh-
annessy 2010; Richards 2001; Schmidt 1985a, 1985b). In this situation, the frequency of
SVO orders may increase, and the degree of flexibility (entropy) may also decrease.
This is the main conclusion of Namboodiripad et al. (2018), who found that English-
dominant Korean-speakers showed a greater relative preference for the canonical
842 Levshina et al.
SOV order than Korean-dominant speakers, that is, they rated non-canonical orders
relatively low as compared to the canonical SOV order. Thus, we see a significantly
lower constituent order flexibility correlating with more English dominance, but not
overt borrowing of the dominant English order into Korean. A greater preference for
canonical order corresponding to increased language contact was also found within
a community of Malayalam speakers by Namboodiripad (2017).
Depending on the social context, language contact within a community can often
correspond to intergenerational variation. An example can be found in Pitjantjat-
jara, a Pama-Nyungan language of Central Australia, which has traditionally been
described as having a dominant SOV order, with a great degree of flexibility. Bowe
(1990) finds SOV order to be approximately twice as frequent as SVO order (although
clauses with at least one argument omitted are much more frequent in her corpus).
Langlois (2004), in her study of teenage girls’Pitjantjatjara, finds that this is reversed,
and SVO order is 50% more frequent than SOV in her sample. An experimental study
of word order in Pitjantjatjara (Nordlinger et al. 2020; Wilmoth et al. Forthcoming)
also substantiated this general trend; the younger generations had a higher pro-
portion of SVO order, presumably as a result of more intense language contact and
more English-medium education. However, while the distribution of different orders
appears to be changing, the overall degree of flexibility as measured by entropy (see
Section 4.1.1) remained approximately stable among all generations of speakers.
There was some effect of older female participants, many of whom had worked as
teachers or translators, being more rigidly SOV. This may be a result of their pre-
scriptive attitudes interacting with the experimental setting.
The presence or absence of pressure of prescriptive norms in a particular lan-
guage variety is an important and relatively under-investigated factor in the dy-
namics of word order. Specifically, vernacular varieties are relatively free from the
pressure of prescriptive norms and are more inclined to word order variation
compared to “standard”varieties. An example is reported in the study by Naccarato
et al. (2020) on genitive noun phrases in several spoken varieties of Russian. In
Standard Russian, the neutral and most frequent word order in genitive noun
phrases is Noun followed by Genitive modifier, while two types of vernacular
Russian –dialects and contact-influenced varieties –often show the alternative
order Genitive modifier followed by Noun. Interestingly, kinship semantics, which
was the strongest factor affecting the choice of a specific word order, turns out to
be the same for both dialectal and contact-influenced Russian (irrespective of the
area and the indigenous languages spoken there). This situation resembles in-
stances of so-called “vernacular universals”, that is, features that are common to
different spoken vernaculars (Filppula et al. 2009; see also Röthlisberger and
Szmrecsanyi 2020), which are sometimes interpreted as language “natural ten-
dencies”in the sense of Chambers (2004). If this is the case, we could argue that
A gradient approach to word order 843
word order rigidity is imposed through standardization, and variation might be
the result of the natural development of language systems that, due to
geographical and historical reasons, are less tightly connected to a particular
standard, or which have resisted or rejected standardization.
3 What we (would) miss out without gradient
approaches
The previous section demonstrated that word order variability is supported by
diverse cognitive and social factors. How all these factors interact with each other is
not yet fully understood. The case studies presented above have also shown that
investigating gradient patterns and their causes, is a fruitful endeavor. In this
section, we strengthen this argument, presenting a series of research questions
that could not be asked or answered without applying a gradient approach
systematically.
3.1 Language processing
A focus on word order gradience is very much the bread and butter of psycho- and
neurolinguistic studies of grammar, where substantial effort has been invested in
explaining ordering constraints on production and the implication of word order
variability (and its correlation with grammatical structure) for comprehension. As
an illustration, consider corpus-based research on dependency length minimization
(DLM), which was discussed in Section 2.1. Some studies suggest that the crosslin-
guistic variation in the effect of dependency length is due to the fact that certain
languages have more word order freedom (Futrell et al. 2015a; Futrell et al. 2020;
Gildea and Temperley 2010; Yamashita and Chang 2001). The argument goes that with
more ordering variability, the ordering preferences might be less subject to DLM and
possibly abide more by other constraints such as information structure (Christianson
and Ferreira 2005). Nevertheless, recent findings from Liu (2021) showed that is not
necessarily the case. Using syntactic constructions in which the head verb has one
direct object noun phrase dependent and exactly one adpositional phrase dependent
adjacent to each other on the same side (e.g., Kobe praised his oldest daughter from
the stands), the results indicated that there is no consistent relationship between
overall flexibility and DLM. On the other hand, when looking at specific ordering
domains (e.g., preverbal vs. postverbal), on average there is a very weak correlation
between DLM and word order variability at a constructional level in the preverbal
844 Levshina et al.
contexts (e.g., preverbal orders in Czech); while no correlation, either positive or
negative, seems to exist between the two in postverbal constructions (e.g., postverbal
orders in Hebrew).
Another illustration is the connection between prosody and syntax, which is an
active area of investigation (Eckstein and Friedericil 2006; Franz et al. 2020; Kreiner and
Eviatar 2014; Luchkina and Cole 2021; Nicodemus 2009; Vaissière and Michaud 2006).
The work which has investigated the connection between prosody and word order from
a phonetic perspective does take phonetic gradience into account and adding a gradient
approach to word order further enriches this area of research. For example, Šimík and
Wierzba (2017) combine gradient constraints in prosody and syntax to model flexible
word order across three Slavic languages. They combine this with a gradient measure of
flexibility in the form of acceptability judgments and find complex interactions between
prosody and information structure (see also Gupton 2021). Asking and answering these
questions about language production and comprehension would be impossible without
taking a gradient approach.
3.2 Language acquisition
Conceptualizing word order in terms of gradience naturally aligns with the large
focus on language acquisition as a cue-based process, which is most notable in
probabilistic functionalist approaches such as the Competition Model (Bates and
MacWhinney 1982) or constraint-satisfaction (Seidenberg and MacDonald 1999) but
is also a feature of formal approaches such as Optimality Theory (Prince and Smo-
lensky 2004). The challenge for acquisition researchers is to determine cue-based
hierarchies across languages and how they interact; in addition, they must deter-
mine whether children bring pre-existing biases to this problem, such as the oft-cited
preference to produce agents before lower-ranking thematic roles (Jackendoffand
Wittenberg 2014), and to interpret early appearing nouns as agents (e.g., Bornkessel-
Schlesewsky and Schlesewsky 2009).
For instance, while both German and Turkish allow for flexibility in the order of
the main constituents with thematic roles marked via case, the transparency of the
Turkish case system relative to German means that Turkish-acquiring children find
variability in word order less problematic than their German-acquiring counter-
parts, who tend to prefer to attend to word order over case until they are much older
(Dittmar et al. 2008; Slobin and Bever 1982; Özge et al. 2019). In the case of German, it
appears that children settle on word order as a “good enough”cue to interpretation
while they acquire the nuances of the case system (for a similar effect in Tagalog, see
Garcia et al. 2020), although they are also sensitive to deviations in word order
marked by cues such as animacy (Kidd et al. 2007).
A gradient approach to word order 845
A focus on cue-weightings (or constraints) has the potential to both inform the
creation of more dynamic models of acquisition (thereby linking them to theories of
adult language processing and production), and also to force questions common to
acquisition concerning representation. Namely, are the cues coded in the structure,
as might be argued in construction/usage-based approaches (e.g., Ambridge and
Lieven 2015), or do they guide, but are independent of the sequencing choices of the
processing system? For instance, do children acquiring European languages like
English or German learn non-canonical structures such as object relative clauses as
constructions by encoding common properties such as [-animate Head Noun] as part
of a prototype structure (e.g., Diessel 2009), or do these distributional properties
provide cues to a more abstract structure building mechanism? Understanding what
causes word order gradience in the target languages and how children identify these
variables is key to answering these questions.
3.3 Language change
Word order has typically been measured in a categorical fashion in typology, making
gradient phenomena somewhat marginal (see Section 3.5). We can observe the same in
historical linguistics: Harris and Campbell (1995: 198) point out that word order re-
constructions have ignored frequently attested alternative word orders. Barðdal et al.
(2020: 6) hold that under a constructional view, this should not be allowed, while
“variation in word order represents different constructions with different
information-structural properties, as such constituting form-meaning pairings of their
own, which are by definition the comparanda of the Comparative Method”. Modeling
sentential word order change in terms of categorical bins such as SVO, VSO, etc. does
not allow for the inclusion of doubling, the intermediary step of an alternative word
order arising; the process of rigidification is glossed over completely (see Section 2.3).
Excluding variation from reconstruction makes reconstructions incomplete at best,
and wrong at worst. See Section 4.4 for a discussion of how to incorporate gradient
measures in a phylogenetic analysis, and which caveats can arise when doing so.
3.4 Language contact
A gradient approach to word order appears to be useful in studies in language
contact. Although syntactic changes usually start at an intense stage of contact, word
order is one of the linguistic features that are more easily borrowed (cf. Thomason
2001: 70–71; Thomason and Kaufman 1988: 74–76). Investigations on contact-induced
word order variation in the world’s languages include studies focusing on word
order at the sentence level (cf. numerous examples listed in Heine 2008: 34–35 and
846 Levshina et al.
Thomason and Kaufman 1988: 55), and studies devoted to variation within the noun
phrase, e.g., focusing on the order of the head noun and its genitive modifier. See, for
instance, Leisiö (2000) on Finnish Russian, and some observations in Sussex (1993) on
émigré Polish, as well as Friedman (2003) on Macedonian Turkish. Most of these
studies take a more categorical approach, focusing on the addition or loss of
particular orders.
However, word order variability has been (indirectly) implicated as a potential
source of the facility of contact-induced change in this domain, via convergence (e.g.,
Aikhenvald 2003): More flexible languages are more likely to have more frequent
alternative orders, and thus are also more likely to have convergent structures with
their contact languages. As such, contact could boost the frequency or change the
status of a previously less frequent or non-canonical but still attested order (e.g.,
Manhardt et al. 2023). So “borrowing”of an order in this case could simply be a
change in status of a previously derived or less frequent order to becoming a ca-
nonical or more frequent order, as in the case of Pitjantjatjara discussed in Section
2.4. Measuring the degree of flexibility, and accounting for the relative status of all
grammatical orders in a language, allows us to see these types of contact effects,
which might not be otherwise detectable.
An illustration of the usefulness of gradient approaches is a study of English-
dominant and Korean-dominant Korean speakers by Namboodiripad et al. (2018),
which was mentioned in Section 2.4. They found that English-dominant Korean
speakers rated canonical SOV order the highest, followed by OSV, the verb-medial
orders the next highest, and verb-initial orders the lowest of all. This is the same
pattern shown by age-matched Korean speakers who grew up and live in Korea.
However, there was a significant quantitative difference based on the degree of
contact with English, as discussed above: more precisely, the English-dominant
Korean speakers showed a greater relative preference for the canonical SOV order
than Korean-dominant speakers. In taking a gradient approach, we are able to see
contact effects that might not otherwise be visible. That is, even when we do not see
an outright change to the basic constituent order in a language, or even if we do not
see the addition of a constituent order due to contact, we might see a decrease in the
number of possible orders, or a decrease in the relative frequency or acceptability of
some orders (see Section 4.2 for more details on this method).
Moving to another example, what at first glance looks like the result of L1
calquing is actually the by-product of a more complex interaction of different factors.
In previous research it has been frequently pointed out that contact in fact reinforces
some existing language-internal tendencies (cf. Poplack 2020; Poplack and Levey
2010; Thomason and Kaufman 1988: 58–59). A similar conclusion is drawn in a recent
study by Naccarato et al. (2021) devoted to word order variation in noun phrases with
a genitive modifier in the variety of Russian spoken in Daghestan. In this variety of L2
A gradient approach to word order 847
Russian the Genitive - Noun order in noun phrases is unexpectedly frequent,
whereas in Standard Russian the Noun - Genitive order is the neutral and by far
most frequent option (see also Section 2.4). A quantitative analysis of the Dag-
hestanian data and the comparison with monolinguals’varieties of spoken Russian
suggest that contact is not the only factor involved in word order variation in Dag-
hestanian Russian. Rather, L1 influence in bilinguals’speech (here, Nakh-
Daghestanian or Turkic languages) seems to interact with some language-internal
tendencies in Russian that, to a certain extent, are observed for non-bilinguals’
varieties too. Without a gradient approach, it would be impossible to test this
interaction.
3.5 Linguistic typology
Due to theoretical and practical reasons (e.g., the over-emphasis on basic word order
in syntax, and the level of granularity available in grammar descriptions), typologists
have mostly coded word order using categorical variables. The famous Greenbergian
“correlations”(Greenberg 1963), for instance, represent associations between cate-
gorical variables, e.g., VO/OV and pre- or postpositions. This practice is responsible
for two types of data reduction in typology (cf. Wälchli 2009): first, languages with
orders with low word order entropy are more likely to be described accurately and
systematically than languages with high entropy; second, word order variables
studied by typologists are biased towards those that represent bimodal distributions.
This can have consequences for characterizing, documenting or revitalizing the
language.
Using a gradient approach can help us deal with these limitations, exploring the
points between the extremes. Moreover, it allows us to add more different con-
structions. For example, oblique nominal phrases and adverbials are highly variable
positionally with regard to their heads, which is why they are less frequently
considered in word order correlations than the “usual suspects”, such as the order of
Verb and Object (Levshina 2019). A gradient approach also allows us to formulate
more precise predictions for typological correlations and implications, minimizing
the space of possible languages, and making the universals more complete (see
Gerdes et al. 2021; Levshina 2022; Naranjo and Becker 2018). For example, instead of
including only predominantly VO or OV languages, we can also make predictions for
more flexible languages which may not fit neatly into one category or another.
Consider an illustration. It is well known that there is a negative correlation
between word order rigidity and case marking (Sapir 1921; Sinnemäki 2008). This
categorical information can be represented by continuous variables, as shown in
Figure 3, which is based on data from 30 online news corpora. Word order variability
848 Levshina et al.
is measured as entropy of Subject and Object order (see Section 4.1.1), while case
marking is represented as Mutual Information between case markers and the syn-
tactic roles of Subject and Object. It shows how much information about the syntactic
role of a noun we have if we know the case form, or, in other words, how system-
atically cases can actually help one guess if the noun is Subject or Object in the corpus
(see Levshina 2021b for more details). The dataset is available in the OSF directory as
Dataset3.txt. The plot shows clearly that the correlation is observed in the entire
range of values of both variables. In particular, languages with differential marking
tend to occupy the mid-range of word order variability, while languages with very
little overlap between Subject and Object case forms like Lithuanian and Hungarian,
allow for maximal freedom (Levshina 2021a). This supports the functional-adaptive
view of language, showing that case and word order are not inherent built-in
properties, but efficiently used communicative cues (cf. Koplenig et al. 2017). Word
order gradience plays thus an important role in functional explanations of cross-
linguistic generalizations (cf. Schmidtke-Bode et al. 2019).
Figure 3: Mutual information between syntactic role (Subject or Object) and formally marked case
plotted against Subject –Object order entropy based on online news corpora.
A gradient approach to word order 849
3.6 Summary
As demonstrated in this section, gradient approaches have already allowed re-
searchers to discover many important facts about word order across different lin-
guistic domains –from language contact to prosody. Without a gradient perspective,
these phenomena could not be investigated. Using a gradient approach also allows us
to include more constructions and languages when testing theoretical hypotheses,
avoiding data reduction. At the same time, the potential of gradient approaches has
not been fully tapped. An important reason is the numerous methodological chal-
lenges and caveats. In the next section, we discuss them and propose some solutions.
4 How to investigate gradience? Challenges and
some solutions
In this section, we delve into the how of implementing our gradient approach. As we
have discussed, methodological limitations have been one barrier to more wide-
spread adoption of gradient approaches to word order. Here, we discuss these
challenges along with current solutions, covering corpora (Section 4.1), experimental
methods (Section 4.2), fieldwork practices (Section 4.3), and phylogenetic compara-
tive methods (Section 4.4). Along the way, we present some novel analyses of corpora
and experimental results using these approaches.
4.1 Measuring gradience in corpora: methods and challenges
4.1.1 Gradient measures and sample size
Recent work on word order variability tends to use a gradient approach to provide a
more accurate picture of word order patterns in language (Ferrer-i-Cancho 2017;
Futrell et al. 2015b; Hammarström 2016; Levshina 2019; Östling 2015). Most of these
studies are based on large-scale annotated corpora such as the Universal De-
pendencies (Zeman et al. 2020; also see Croft et al. [2017] for a discussion of using UD
corpora to address questions in linguistic typology). Two common gradient measures
are the proportion of a particular word order and the Shannon entropy of a
particular word order. As an example, let us imagine that we are interested in the
order of the object and the verb in a language. One straightforward way to measure
the stability of such order in a language is to consider the proportions of OV and VO
orders. Another measure frequently used is Shannon’s entropy (Shannon 1948),
850 Levshina et al.
which represents the uncertainty of the data. The higher the entropy (which ranges
between 0 and 1 in the case of two possible outcomes), the more variable the word
order. For instance, a distribution of 500 sentences with OV and 500 sentences with
VO would result in an entropy of 1.
17
If the word order is always OV or always VO, the
entropy will be 0. If OV is observed 95% of the time, the entropy will be equal
to −[(0.05*log
2
(0.05)) +(0.95*log
2
(0.95))], which is 0.2864.
18
Generally speaking, the
choice between entropy or simple proportions depends on what is more important
for the researcher: reliability of word order as a cue for interpreting the speaker’s
message, or the preference for a specific word order. That said, entropy is particu-
larly useful when considering word order variability in case of more than two
possibilities, e.g., the order of S, V, and O.
While these measures are relatively established, the effect of corpus size is
frequently mentioned as a potential source of error. As an example, “the corpora
available vary hugely in their sample size, from 1,017 sentences of Irish to 82,451
sentences of Czech. An entropy difference between one language and another might
be the result of sample size differences, rather than a real linguistic difference”
(Futrell et al. 2015b: 93–94). The question of how large a corpus should be to provide
robust measures of word order is very difficult because the answer depends on the
frequency of the construction we investigate, which can also vary substantially
across languages and genres. Here, we will focus on how many instances of a con-
struction are necessary to measure entropy for the relative ordering of Subject and
Verb (S&V) and Verb and Object (V&O). We aim to show what the appropriate sample
size might be.
Let us consider the order of S&V and V&O within a sample of 30 languages
extracted from the Universal Dependencies: Arabic, Basque, Bulgarian, Catalan,
Chinese, Croatian, Danish, Dutch, English, Estonian, Finnish, French, German, He-
brew, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Persian,
Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish, and Ukrainian.
This sample is obviously biased toward Indo-European languages, but it serves here
solely for illustrating the methodology. To visualize the effect of corpus size, we
randomly sample clauses containing a verb with a dependent nominal subject/object
from the existing corpora, with an increasing range of 20 clauses. That is to say, we
first sample 20 clauses, then 40, 60, and so on until we reach a size of 2,000 clauses.
For each sample, we measure the proportion of the S&V and V&O orders. The data are
17 It is worth pointing out that different studies may use variants to calculate the entropy, and these
variants may be affected in different ways by corpus size. For instance, Futrell et al. (2015b) have a
more complex definition of word order entropy (as opposed to the coarse-grained entropy we use in
our examples), which is conditional on several factors and may be more unstable in small corpora.
18 See www.shannonentropy.netmark.pl for a live calculator of the entropy.
A gradient approach to word order 851
Figure 5: The variation of entropy across different sample sizes for Basque, English, French, and
German.
Figure 4: The stabilization of the word order entropy across different sample sizes. Each point indicates
the gap of entropy between a given sample size and the sample size set as 2,000 instances.
852 Levshina et al.
available as Dataset4.txt in the OSF directory. Next, we calculate the entropy as
described above and also the difference as compared to the entropy measured with
the corpus size set as 2,000 clauses. This difference is called here “gap of entropy”.
The results averaged across all languages are plotted in Figure 4.
The results indicate that extremely small sample sizes result in unstable results,
as the entropy is likely to be biased by the sampled sentences, but sample sizes of
more than 300 clauses produce entropy gaps less than 0.1. If we zoom in on a few
individual languages, such as Basque, English, French, and German (see Figure 5) and
plot the entropy measures, we observe that the entropy varies a lot within approx-
imately the first 500 clauses but stabilizes after that threshold. This means that using
a sample size of 500 or even larger is likely to give similar results in terms of S&V and
O&V order.
Additional computational measures can be used to quantify where exactly the
entropy converges for each language (see Janssen 2018: 79 and Josserand et al. 2021:
10 for an example of how to calculate where a curve stabilizes). Nevertheless, this
example provides a quick-and-dirty demonstration of how sample size could be
considered in studies related to word order. We recommend that such analyses
should be performed on different word order categories and language families to
investigate to what extent the entropy measure depends on the sample size cross-
linguistically and/or across different constructions.
4.1.2 Potential biases in corpora: annotation methods
Parallel and comparable corpora of reasonable size usually rely on automatic
annotation, thus introducing a potential bias depending on the quality of annotation.
The quality is strongly influenced by the size and text types of the treebanks available
for training the software performing the automatic annotation (the “parser”).
If we take a look at the current version (at the time of writing: 2.7) of the
Universal Dependency Treebanks, we see that just 21 languages out of 111 have more
than 3 treebanks. These languages are mostly national languages of Asian and Eu-
ropean countries, and among them, only Arabic, Czech, French, German, Japanese,
Russian and Spanish have treebanks with a total number of more than 1M tokens. In
order to quantify the annotation quality bias both within and across languages, we
run a comparison between the automatic annotation of the parallel corpus of Indo-
European Prose and More (CIEP+: Talamo and Verkerk 2022) and the corresponding
UD treebanks used for training the parser.
19
Our comparison focuses on the Shannon entropy of the relative order of nominal
heads and modifiers in eleven languages from five different genera of the Indo-
19 UD v.2.5, except for Welsh: v.2.6.
A gradient approach to word order 853
European family. Furthermore, we limit our analyses to the following syntactic re-
lations/UD relations: adpositions (case), determiners (det),
20
adjectival modification
(amod) and relative clauses (acl:relcl).
21
Results are plotted in Figure 6. The dataset is
available as Dataset5.txt in the OSF directory, which also contains the code for the
Figure 6: A comparison between the entropy of four nominal dependencies in eleven languages from
two data sources: an automatically-parsed corpus (CIEP+) and the UD treebanks that have been
employed for training the parser.
20 Italian, Polish and Portuguese have specific UD relations for possessive pronouns and/or quan-
tifiers; for data consistency, we have added the entropy of these relations to the entropy of the det
relation.
21 The UD German parser (model: GSD) has a very limited support of the acl:relcl; accordingly, we do
not have data for CIEP+relative clauses in German.
854 Levshina et al.
statistical analysis. Assuming that UD treebanks are manually annotated and/or cor-
rected data sources, thus serving as gold standards, thedifferent –and, in many cases,
higher –rates of entropy in the CIEP+corpus are likely due to wrong annotations, or
“noise”. Yet, despite some outliers such as amod and det in Polish and case in Dutch,
the amount of noise in automatic annotations is overall low. A regression model with
absolute differences between entropies estimated based on CIEP+and UD, and lan-
guages and dependencies as random intercepts shows that the expected absolute
difference (represented by the intercept) is 0.05. It is statistically significant (t=3.429).
22
Although the acceptability of this amount of noise depends on the level of precision
required for a specific research question, this value can be seen as relatively low.
Note that source languages may influence target languages in translation
(Johansson and Hofland 1994; Levshina 2017), such that the distributions of words or
grammatical constructions may differ in translated versus spontaneously-generated
texts. However, our results suggest that this is not the case in most languages and
dependencies. Needless to say, both sources of bias (i.e., annotation and trans-
lationese) require further investigation, especially for low-resource languages, but
we hope that our case study gives us some reasons for cautious optimism with regard
to the practical feasibility of the gradient approach.
From a more theoretical perspective, we should add that the UD do not reflect
categorial universalism in the sense that they do not presuppose any innate set of
categories (cf. Haspelmath 2010). The UD were developed based on several desid-
erata, which are often in conflict: they should be suitable for both language
description and language comparison, and for both rapid, consistent annotation by a
human annotator and high-quality computational annotation. They should also be
accessible to non-linguists, which explains why traditional Eurocentric terms are
used.
23
According to Croft et al. (2017), most of these desiderata match the goals of a
typologist. Importantly, the UD represent an inventory of universal constructions
that serve some communicative functions. For example, the dependency ‘case’used
for adpositions reflects the fact that they are analogous to morphological marking.
This allows (together with the annotation of morphological case in a separate
annotation layer of the UD) to extract all cases of the universal construction whose
function is to relate an argument dependent to its head (Croft et al. 2017: 65–66).
If annotators believe that it is important to reflect some important idiosyncrasies
in a particular language, they can add extensions of the annotation tag to make it more
fine-grained. For example, for some Slavic languages, the UD include ‘det:nummod’,
22 The model structure was abs_diff∼(1|Language) +(1|Dependency). The standard deviations of the
random intercepts were as follows: 0.022 (Language) and 0.026 (Dependency).
23 See the full list of criteria at http://www.universaldependencies.org/introduction (last access 02.
03.2022).
A gradient approach to word order 855
which marks a special type of quantifiers that do not agree with the quantified noun in
case and require the counted noun to be in its genitive form. Another option is to
introduce a special tag, e.g., classifiers for Chinese. Importantly, these differences
should be well documented. A linguist planning to use the UD corpora and tools should
first check howthe categories ofinterest are encoded in the corpora. But such a check
is necessary, in principle, for all kinds of cross-linguistic data.
We would like to acknowledge that in some cases, the UD framework is not yet
directly applicable. For instance, the UD approach is based on annotation of words,
which will not be meaningful for analysis of polysynthetic languages. At the same
time, the UD annotation guidelines allow flexible modifications and extensions. For
example, we could treat each individual morpheme within a polysynthetic word as
the alternative of ‘one word’in a sentence in English, then design additional de-
pendency relations to show their grammatical properties in relation to the root of the
word. For examples of work in that direction, see Spence et al. (2018) on Hupa, a Dene
language of Northwestern California, and Park et al. (2021) on St. Lawrence Island
Yupik, an endangered language spoken in the Bering Strait region.
4.1.3 The influence of register, text type, and modality
In addition to the problems of corpus size, annotation quality and translationese,
another potential source of bias is different linguistic registers and text types, which
can have different word order distributions (Batova 2020; Brunato and Dell’Orletta
2017; Panhuis 1981). The extent of this variability is an open question: some evidence
suggests, in particular, that register and text type does not have a significant impact
on overall patterns of dependency direction (Liu 2008, 2010). For many languages and
constructions, however, we lack sufficient evidence. The most obvious problem is the
bias towards written texts. In particular, the UD corpora are almost universally
written, rather than spoken or signed. In fact, out of the 111 varieties represented in
UD 2.7, only 22 include spoken or signed data, and this data is not always clearly
separated from written data within a given corpus (Zeman et al. 2020). For many
languages, available corpora are restricted to news and web-crawled texts (e.g., the
Leipzig Corpora Collection, Goldhahn et al. 2012). Collection of texts containing the
same content in different languages, or parallel corpora (Cysouw and Wälchli 2007),
may help to mitigate this problem, providing new text types (e.g., fiction or film
subtitles).
The importance of modality for estimation of word order is obvious from the