Content uploaded by Péter Jeszenszky
Author content
All content in this area was uploaded by Péter Jeszenszky on Nov 28, 2018
Content may be subject to copyright.
Towards the parameterisation and quantification
of dialect contact potential: An extended abstract
Péter Jeszenszky
University of Zurich, Department of Geography, [Winterthurerstrasse 190, CH- 8057 Zürich],
Switzerland
peter.jeszenszky@geo.uzh.ch
https://orcid.org/0000-0002-0873-1743
Sandro Bachmann
University of Zurich, German Department, [Schönbergstrasse 9, CH- 8001 Zürich], Switzerland
sandro.bachmann@uzh.ch
https://orcid.org/0000-0003-3736-6228
Peter Ranacher
University of Zurich, Department of Geography, [Winterthurerstrasse 190, CH- 8057 Zürich],
Switzerland
peter.ranacher@geo.uzh.ch
https://orcid.org/0000-0002-8680-4063
Abstract
Languages and dialectal innovations spread by the migration of their speakers or by ‘adjacent’
people adapting to a neighbouring variation. Thus, space influences the distribution of languages
and dialects as it might promote or hinder contact. However, different aspects of language are
affected by space in a different way. Some phenomena show gradual distributions, while others
have crisp boundaries. Some phenomena seem to disperse easier than others.
In this analysis we explore the influence of three spatial predictors - travel times in 1950 and
2000 and Euclidean distance on Swiss German dialects. We perform a redundancy analysis and
show how the spatial predictors explain the variation in five syntactic domains. We find that
space has a different influence on different grammatical aspects.
Our method is potentially useful for other ratio-type data, coming from linguistics or business
information, and for other spatially dispersed human-related variables such as ethnicity and
cultural traits.
2012 ACM Subject Classification Physical sciences and engineering →Earth and atmospheric
sciences, Mathematics and statistics, Artificial intelligence →Knowledge representation and
reasoning →Spatial and physical reasoning
Keywords and phrases spatial analysis, principal coordinate analysis, spatial autocorrelation,
linguistics, redundancy analysis, ordination, dialectometry
Acknowledgements We gratefully acknowledge the provision of data by Elvira Glaser and the
German Department at the University of Zurich, and the Institute for Transport Planning and
Systems at ETH Zurich, respectively.
1 Motivation
Together with demographic features like nationality, ethnicity and religion, language and
dialects are important factors for generating identities. Language and its spatial distribution
have a great impact on how society is shaped. In turn, geographic space also impacts language.
Language change has strong ties to contact and, conversely, isolation between speakers of
159:2 Quantification of dialect contact potential
a language [
13
], which are also governed by spatial factors besides socio-demographic and
historical ones.
Various projects worldwide are directed at the digital collection of dialect data, aimed at
quantitative analyses, mostly pursued by linguistics. The increasing amount and complexity of
the data calls for automated models and, thus, the involvement of spatial and computational
sciences in dialectology. Despite all this, linguistic research has not received much attention
in GIScience.
This study is an extension of a former study of the authors [
11
], where the focus was
placed on developing a linguistic distance measure and applying it to dialectal variables in
spatial subsets and at the global level. The study showed that travel times have a higher
explanatory potential than Euclidean distances for the overall variation of (morpho-)syntactic
features in Swiss German dialects. The present study extends this method in the direction
of spatial analysis and explores the influence of spatial predictors, i.e. travel times and
Euclidean distance, on individual grammatical domains.
Different linguistic phenomena tend to display different spatial and social distributions [
2
]
and are explained by different underlying processes. The goal of this study is to explore the
role of spatial factors in the variation of different dialectal phenomena. On a larger scale, the
research aims to contribute to one the greatest questions in dialectology and geolinguistics:
How does spatial diffusion of linguistic variables work?
2 Related work
Dialectometry has researched the often sublinear relationship of linguistic and spatial variables
for a long time [
20
]. To characterise the multidimensional nature of dialects, linguistic variables
have been investigated as aggregates with the prospect of uncertainties in individual variables
levelling out [
15
]. Dialectometry strove for exploring the continuum of variation at the level
of dialect areas as well at the level of individual variables.
Different theoretical models have been developed to examine the diffusion of linguistic
innovations. Trudgill [
23
] confirms social barriers should also be used in language change
modelling: ‘urban hierarchical’ or ‘cascade diffusion’ [
23
][
2
] assumes that innovations spread
from larger populations towards smaller ones, corresponding to the mobility patterns of the
population, similarly to the effect of gravity.
The axiomatic role of geography structuring language [
16
] has been tested in numerous
studies with different explanatory variables, with Euclidean distance as the default. Gooskens
[
8
] operationalised the possibility of contact using travel times first while Szmrecsanyi [
22
]
tested travel times and Trudgill’s linguistic gravity index on British syntax. The latter,
among other studies, have not found travel times a better predictor for dialectal variation
than Euclidean distances. Derungs et al. [
3
] researched the explanatory value of different
administrative categorical boundaries for dialectal variation, modelling spatial dependences
using hiking distances.
The linguistic thesis of Pickl [
19
] is a good example for (geo)statistical methods being
implemented in dialect research. Ordination techniques, such as multidimensional scaling
(MDS) (e.g., [
10
]) became a state of the art method for dimension reduction intended for
visualisation, while principal component analysis (PCA) (e.g., [
21
]) and factor analysis (e.g.,
[
19
]) are used to detect linguistic items showing similar geographical patterns. Importantly,
geographic factors have been represented using generalised additive modelling [
24
], allowing
to find the (non-linear) functional relation of multiple explanatory variables to linguistic
variation.
P. Jeszenszky, S.Bachmann and P. Ranacher 159:3
3 Data
The linguistic data used in this study stems from the ‘Syntactic Atlas of German-speaking
Switzerland’ (SADS – [
7
][
6
]), a survey-based dialect atlas. The survey focuses on syntax, the
linguistic level concerned with the grammatical structures at the level of sentences, such as
word order or declination. Syntax is assumed to differentiate at a slower rate over time than
other linguistic levels [
14
], making it eligible for the investigation of diachronic variation (i.e.,
change through time).
The SADS survey was conducted between 2000 and 2002 with a dense sampling strategy
of 383 survey sites throughout the German-speaking area of Switzerland. Respondents
answered 118 questions related to about 50 syntactic phenomena. The survey specifically
focused on finding as wide a syntactic variation as possible, while maintaining authenticity.
Importantly, the SADS features several respondents (3-26) at each survey site from people
of different age groups and professions, thus a local variation is captured in each dialectal
variable. That is, a co-occurrence of different variants per site is potentially present.
In this study, 60 SADS variables are used, listed in [
11
]. The variables are assigned to five
syntactic domains [
6
], capturing different aspects of syntax: 1) Noun phrases, 2) Pronouns,
3) Verbal complex, 4) Secondary predication and 5) Sentential and phrasal conjunctions.
The spatial variables used in this study are the Euclidean distances and travel times
by car for the years 1950 and 2000, provided by the Institute for Transport Planning and
Systems at ETH Zurich [5].
4 Methodology
We apply methods from ecology to explore the relationship of the spatial factors on the five
syntactic domains (and all domains combined). For each domain we compute the linguistic
distances (
LD
) between all survey sites using the method in [
11
]. Then the
LD
are arranged
in distance matrices. We perform a principal coordinate analysis (PCoA) [
9
] on each matrix
and retain the first
k
principal coordinates, which together explain at least 75% of the
variance. Thus, we capture most of the variation in each domain.
Similarly, the travel times and the Euclidean distances are arranged in distance matrices.
We perform PCoA on the travel times and compute Distance-based Moran’s eigenvector maps
(DBMem) from the Euclidean distance matrix [4].
We perform a Redundancy analysis (RDA) to explore the linear relationship between the
linguistic domains and space. RDA is a direct extension of linear regression analysis to model
multivariate response data [
1
]. In RDA, both the predictor and the response may consist
of several variables, which are the principal coordinates in our case. RDA is performed
separately for each of the domains, revealing how much of the variation can be explained by
the spatial proxies for dialect contact.
We compute the explained variance (adjusted
R2
) and perform an ANOVA-like significance
test. Finally, we generate a partition plot (Figure 1) to visualize the individual and combined
effects of the spatial variables on the linguistic domains.
The study has been carried out using the R packages ape [18] and vegan [17].
5 Results and discussion
The preliminary results of the analysis give us a good starting point for interpreting the effects
of spatial factors on syntactic domains in Swiss German dialects. The adjusted
R2
values
differ for each syntactic domain, suggesting that the variation in some dialectal variables
159:4 Quantification of dialect contact potential
Figure 1
Factors shaping the observed variation in syntactic domains in Swiss German dialects.
Adjusted
R2
values, as derived from RDA, are shown in the Venn-diagram. Values above 0 are
shown.
are affected by space in a different way. The explained variances are as follows (levels of
significance are given by stars). 1) Noun phrases – 32.02%***, 2) Pronouns – 33.61%**, 3)
Verbal complex – 44.4%***, 4) Secondary predication – 25.53%* and 5) Sentential and phrasal
conjunctions – 33.93%***. The explained variance in all domains combined is 52.1%***.
Not surprisingly, the three spatial predictors appear redundant in how they explain the
variation in the syntactic domains. Together they account for most of the explained variation
in the domains 3, 4 and 5. The travel times in 2000 is the best single predictor for domain 1
and 2. For domain 3 and 5, and for all domains combined the travel time in 1950 appears to
be the best single predictor, which is in line with previous findings [
11
]. Euclidean distance
alone appears to be inferior in predicting syntactic variation compared to travel times.
Considering the distribution of variants for the individual linguistic variables and compar-
ing them to those within the same syntactic domain, the following factors could contribute to
differences in explained variance between the five domains. Possible factor 1: The disparity
of the spatial distribution of variants (their extent in space and their boundaries) across
individual linguistic variables. This holds true especially for domains 2 and 5, and less for
domain 4. Possible factor 2: The fuzziness of variants’ boundaries (cf. also [
12
]). Boundaries
in domain 4 are usually rather sharp for at least one of the variants although the subdomain
of personal pronouns in domain 2 exhibits extremely sharp borders as well. Possible factor 3:
The ubiquitous spatial distribution of at least two variants for the same linguistic variable.
Some subdomains in domain 2 (esp. pronoun clusters) and, to a smaller extent, in domain
5 come close to the extreme case of congruence in spatial distribution. Possible factor 4:
The high number of variants for some syntactic variables causing an overall complex spatial
distribution (e.g., one or multiple variants being completely superimposed by one or more
other variants). In domain 1, the number of variants appears to be rather high for most of
P. Jeszenszky, S.Bachmann and P. Ranacher 159:5
the lingistic variables. Possible factor 5: The number of variables within a syntactic domain
are very different. Several variables can be regarded as belonging to another domain, because
they are closely related. This is especially true for the variables in domain 3. Thus the fuzzy
nature of grammatical categorisation is a confounding factor.
In this study the variables have been used as proxies for the possibility of dialect contact.
The analysis showed preliminary results toward finding those factors that help the spreading
of dialectal variants in the variables investigated. In a more thorough spatial analysis of
individual variables, for example in an analysis of dialectal evolution, the underlying variables
found to explain most variation should be taken into account with a higher weight. Based
on the correspondences unravelled, a parameterisation of dialect contact potential is possible
which would lead to a more informed quantification of language evolution.
It is also possible to perform similar analyses on linguistic distance matrices based on
individual linguistic variables. A more informed linguistic distance can be obtained by the
decorrelation of certain variables that address the same dialectal phenomenon and show
similar distribution in space - as they potentially present multicollinearity.
The approach can be extended to include additional predictors, for example socio-
demographic variables, population density, altitude, Trudgill’s linguistic gravity index [
23
],
based on the population of survey sites and distances in between. Categorical predictors,
such as administrative borders, similarly to Derungs et al. [3], can also be included.
References
1Daniel Borcard, François Gillet, and Pierre Legendre. Numerical Ecology with R. 2011.
doi:10.1017/CBO9781107415324.004.
2Jack K. Chambers and Peter Trudgill. Dialectology. Cambridge University Press, Cam-
bridge, 2nd edition, 2004.
3Curdin Derungs, Christian D. Sieber, Robert Weibel, and Elvira Glaser. Borders in a
Dialect Landscape – Administration is more Formative than Economy or Religion. PLoS
ONE, under review.
4Stéphane Dray, Pierre Legendre, and Pedro R. Peres-Neto. Spatial modelling: A com-
prehensive framework for principal coordinate analysis of neighbour matrices (PCNM).
Ecological Modelling, 196(3-4):483–493, 2006. doi:10.1016/j.ecolmodel.2006.02.015.
5Philipp Fröhlich, Thomas Frey, Serge Reubi, and Hans Ulrich Schiedt. Entwicklung
des Transitverkehrs-Systems und deren Auswirkung auf die Raumnutzung in der Sch-
weiz (COST 340): Verkehrsnetz-Datenbank. 2004. URL: http://www.ivt.ethz.ch/vpl/
publications/reports.
6Elvira Glaser, editor. Syntaktischer Atlas der deutschen Schweiz. University of Zurich,
forthcoming.
7Elvira Glaser and Gabriela Bart. Dialektsyntax des Schweizerdeutschen. In Roland Kehrein,
Alfred Lameli, and Stefan Rabanus, editors, Regionale Variation des Deutschen. Projekte
und Perspektiven., chapter 4, pages 79–105. De Gruyter, Berlin, 2015.
8Charlotte Gooskens. Norwegian dialect distances geographically explained. In Britt-Louise
Gunnarson, Lena Bergström, and Gerd et al. Eklund, editors, Language Variation in
Europe. Papers from the ICLAVE 2004., pages 195–206, Uppsala, 2004.
9J. C. Gower. Some Distance Properties of Latent Root and Vector Methods Used in
Multivariate Analysis. Biometrika, 53(3/4):325, 1966. doi:10.2307/2333639.
10 Wilbert Heeringa. Measuring dialect pronunciation differences using Levenshtein distance.
PhD thesis, University of Groningen, 2004.
159:6 Quantification of dialect contact potential
11 Péter Jeszenszky, Philipp Stoeckle, Elvira Glaser, and Robert Weibel. Exploring global
and local patterns in the correlation of geographic distances and morphosyntactic variation
in Swiss German. Journal of Linguistic Geography, 5(2), 2017. doi:10.1017/jlg.2017.5.
12 Péter Jeszenszky, Philipp Stoeckle, Elvira Glaser, and Robert Weibel. Crisp breaks vs. con-
tinuous transitions : finding quantitative models for transitions between syntactic variants.
Journal of Linguistic Geography, in revision.
13 Sean Lee and Toshikazu Hasegawa. Oceanic barriers promote language diversification in
the Japanese Islands. Journal of Evolutionary Biology, 27(9):1905–1912, 2014. doi:10.
1111/jeb.12442.
14 Giuseppe Longobardi and Cristina Guardiano. Evidence for syntax as a signal of historical
relatedness. Lingua, 119(11):1679–1706, 2009. doi:10.1016/j.lingua.2008.09.012.
15 John Nerbonne. Data-Driven Dialectology. Language and Linguistics Compass, 3(1):175–
198, 2009. doi:10.1111/j.1749-818X.2008.00114.x.
16 John Nerbonne and Peter Kleiweg. Toward a dialectological yardstick. Journal of Quant-
itative Linguistics, 14(2):148 – 167, 2007.
17 Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, and et al. vegan: Community
ecology package. 2018. URL: cran.r-project.org/web/packages/vegan/vegan.pdf.
18 Emmanuel Paradis, Simon Blomberg, Ben Bolker, and et al. ape: Analyses of phylogenetics
and evolution. 2018. URL: https://cran.r-project.org/web/packages/ape/ape.pdf.
19 Simon Pickl. Probabilistische Geolinguistik. Doctoral, University of Salzburg, 2013. doi:
10.1063/1.3541948.
20 Jean Séguy. La relation entre la distance spatiale et la distance lexicale. Revue de Lin-
guistique Romane, 35(138):335–357, 1971.
21 Robert G. Jr. Shackleton. English-American Speech Relationships: A Quantitat-
ive Approach. Journal of English Linguistics, 33(2):99–160, 2005. doi:10.1177/
0075424205279017.
22 Benedikt Szmrecsanyi. Geography is overrated. In Sandra Hansen, Christian Schwarz, and
Philipp et al. Stoeckle, editors, Dialectological and Folk Dialectological Concepts of Space,
pages 215–231. De Gruyter, Berlin, Boston, 2012.
23 Peter Trudgill. Linguistic change and diffusion : Description and explanation in sociolin-
guistic dialect geography. Language in Society, 2:215–246, 1974.
24 Martijn Wieling, Simonetta Montemagni, John Nerbonne, and R Harald Baayen. Lexical
differences between Tuscan dialects and standard Italian: Accounting for geographic and so-
ciodemographic variation using generalized additive mixed modeling. Language, 90(3):669–
692, 2014. doi:10.1353/lan.2014.0064.