Analyzing Linguistic Data: A practical introduction to statistics using R
Abstract
Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.
... An overall effect for condition was tested for significance by using likelihood ratio tests comparing the model including condition to the model without it (cf. Baayen, 2008;Baayen et al., 2008;Winter, 2013). The model with an AIC value (Akaike's information criterion) of at least two points smaller was considered the better model (cf. ...
... There is an overall effect of condition as determined by the likelihood ratio test comparing the model including condition to the model without it (cf. Baayen, 2008;Baayen et al., 2008;Winter, 2013) [χ 2 ...
... An overall effect of condition and the interaction term, respectively, was tested for significance by performing a likelihood ratio test comparing the model including the fixed factor to the model without it (cf. Baayen, 2008;Baayen et al., 2008;Winter, 2013). The model with an AIC value (Akaike's information criterion) of at least two points smaller was considered the better model (cf. ...
This study investigates the interplay between alternation preferences and corrective focus marking in the production of German and English speakers. Both languages prefer an alternation of strong and weak, and both use pitch accenting to indicate focus structure. The objective of the study is to determine whether the preference for rhythmic alternation can account for variations in the prosodic marking of focus. Contrary to previous claims, the results obtained from three production experiments indicate that rhythmic adjustment strategies do occur during focus marking. However, despite the similarities between the two languages, they employ different strategies when alternation and focus marking work in opposite directions. German speakers often employ a melodic alternation of high and low by realizing the first of two adjacent focus accents with a rising pitch accent (L*H), while English speakers frequently omit the first focus accent in clash contexts. This finding is further supported by a second experiment that investigates pitch accent clashes in rhythm rule contexts under various focus environments. The findings suggest that the preference for alternation can influence the prosodic marking of focus and contributes to variation in the realization of information-structure categories.
... To assess whether the different case-marking conditions (nominative vs. topic markers) influence the bias of mentioning the subject or object in the previous clause, we fit a mixed-effects logistic regression model (Baayen 2008) to the likelihood of mentioning the subject of the previous clause in the participants' responses. The model included the fixed effects of verb bias (NP1, NP2), case marking (nominative, topic), and their interaction as well as the random effects of participant and item. ...
... The remaining data were statistically analyzed using a linear mixed-effects model (Baayen 2008;Baayen et al. 2008). To meet the normal distribution requirement of the data, all RTs were log-transformed (Ratcliff 1993). ...
... In light of Nariyama's (2002) account, we attribute the pronounced effect of topicality in our study to the function of the topic marker whose scope extends to the following clause when the subordinate clause is followed by the main clause. We thus conclude that the conjoined effects of increased salience of the explicit topic marker and its scopal property led to the current results (Arnold 1998(Arnold , 2010Au 1986;Baayen 2008). Future research is needed to investigate whether the topicality effect would also emerge in other complex clause contexts beyond implicit consequentiality sentences. ...
There is little consensus as to whether the use of implicit causal biases is driven exclusively by verb semantics or mediated by an interaction of verb semantics and other information sources. We tested whether the topic status of a subject modulates Korean speakers’ referential choices and processing in the interpretation of implicit consequentiality information. Results from two sentence-completion tasks (Experiment 1) showed more subject reference in participants’ continuations when the preceding subject was marked by the topic rather than the nominative marker, regardless of the directionality of the implicit consequentiality bias. In a self-paced reading task (Experiment 2), Korean speakers spent shorter reading times when the referent in the consequence clause was resolved as referring to the previous subject than when it referred to the previous object, although only in the topic-marked condition and not in the nominative-marked condition. Our results suggest that the implicit consequentiality effect remains consistent regardless of the subject’s topic status in the offline tasks, but the effect interacts with the topicality effect in real-time sentence processing. We discuss the implications of our findings for assumptions concerning the underlying mechanisms of referential resolution in discourse including causal bias verbs.
... 19 To test for the distribution of grammatical aspect per flavor, we employed the glmer function in the statistical package lme4 in R (R Core Team, 2013). The data was fitted into a generalized linear mixed (logit) model 20 using the maximum likelihood method (Laplace Approximation) (Baayen, 2008;Dixon, 2008;Matuschek et al., 2017). Fixed effects included flavor (epistemic vs. root) and aspect (grammatical aspect vs. bare), as well as the interaction between flavor and aspect. ...
... * * * . the maximum likelihood method (Laplace Approximation) (Baayen, 2008;Dixon, 2008;Matuschek et al., 2017). Fixed effects included flavor (epistemic vs. root) and lexical aspect (stative vs. eventive), as well as the interaction between flavor and aspect. ...
... To test for the distribution of flavor (epistemic vs. root) by stativity, we employed the glmer function in the statistical package lme4 in R (R Core Team, 2013). The data was fitted into a generalized linear mixed (logit) model using the maximum likelihood method (Laplace Approximation) (Baayen, 2008;Dixon, 2008;Matuschek et al., 2017). 22 In this and later models, we treat flavor of usage as the dependent variable, and observable features of the utterances as independent variables, essentially asking if the observable condition leads to a significantly greater likelihood of one flavor over the other; for the child, does what they can observe predict flavor of use? ...
This paper investigates how children figure out that modals like must can be used to express both epistemic and “root” (i.e. non epistemic) flavors. The existing acquisition literature shows that children produce modals with epistemic meanings up to a year later than with root meanings. We conducted a corpus study to examine how modality is expressed in speech to and by young children, to investigate the ways in which the linguistic input children hear may help or hinder them in uncovering the flavor flexibility of modals. Our results show that the way parents use modals may obscure the fact that they can express epistemic flavors: modals are very rarely used epistemically. Yet, children eventually figure it out; our results suggest that some do so even before age 3. To investigate how children pick up on epistemic flavors, we explore distributional cues that distinguish roots and epistemics. The semantic literature argues they differ in “temporal orientation” (Condoravdi, 2002): while epistemics can have present or past orientation, root modals tend to be constrained to future orientation (Werner 2006; Klecha, 2016; Rullmann & Matthewson, 2018). We show that in child-directed speech, this constraint is well-reflected in the distribution of aspectual features of roots and epistemics, but that the signal might be weak given the strong usage bias towards roots. We discuss (a) what these results imply for how children might acquire adult-like modal representations, and (b) possible learning paths towards adult-like modal representations.
... Le discours politique a également été étudié pour lui-même. En se centrant sur le discours du gouvernement en français, Labbé & Monière (2003;2008) De ces analyses, on peut conclure que les institutions tendent à effacer partiellement les différences entre partis lorsque leurs leaders exercent le pouvoir. Le facteur temps tend aussi à expliquer les variations entre présidents ou premiers ministres. ...
... En appliquant cette approche sur chaque paire de profils présidentiels, nous obtenons une matrice symétrique (10 x 10 = 100 valeurs) rendant difficile l'interprétation. Pour dégager les proximités, on applique à ces données une classification automatique (Baayen 2008) selon une technique dérivée des arbres phylogénétiques (Paradis 2011). ...
Over the past sixty-six years, eight presidents successively headed the Fifth French Republic (de Gaulle, Pompidou, Giscard d'Estaing, Mitterrand, Chirac, Sarkozy, Holland, Macron). After presenting the corpus of their speeches-9,202 texts and more than 20 million labelled words-the style of each of them will be characterized by their vocabulary (lemmas and part-of-speech). A deeper analysis reveals the typical sequences of each tenant of the Elysée. Based on an intertextual distance between all presidential speeches, a synthesis can be drawn reflecting the similarities and differences between presidents.
... Data were analysed using cumulative logit models performed with the ordinal package (Christensen, 2019) in R version 3.2.3 (Baayen, 2008;Baayen et al., 2008;Bates et al., 2014). Each model included Helmert coding: for Experiment 1b, we compared (a) who obj , how, and why with that as a baseline; (b) who obj and how with why; and (c) who obj and how. ...
... Data were analysed using linear mixed-effect regression models performed with the lme4 package in R version 3.2.3 (Baayen, 2008;Baayen et al., 2008;Bates et al., 2014). Each model included Helmert coding, where at the deeply embedded N, we compared (a) who obj , how, why with that as a baseline; (b) who obj and how with why; and (c) who obj with how. 12 At the adverb before the third verb and the third verb region, we compared (a) who obj , how, why with that as a baseline; (b) who obj with how and why and (c) why with how. ...
We present experimental evidence showing that different wh-filler-gap dependencies are processed differently, depending on their syntactic licensors. Our studies compared the active storage profiles for why, how, and who (serving as subject or object of the verb). The results of offline and online experiments revealed that these wh-fillers are stored in memory for different durations, and predictably so based on the hypothesized structural distance between each wh-filler and the licensor which determines its grammatical and interpretive functions. Furthermore, the results showed that once the wh-filler is licensed, it is integrated to the current structure, and no longer engenders additional memory costs. Based on these findings, we argue that the mechanism of online sentence processing may employ both storage and integration components in memory.
... Many experimental investigations of SLA have involved at least one ANOVA design with two or more factors and one or more tested interactions (e.g., Plonsky & Gass, 2011 ;Lindstromberg, 2016 ;Plonsky, 2013 ). Studies using factorial ANOVA are still plentifully reported in SLA publications even though it is increasingly well-known that traditional ANOVA is inferior to mixed-effects regression with respect to a number of common types of SLA experimental study, such as ones that include collection and analysis of learners' responses to sets of lexical items (e.g., Baayen, 2008 ). Consequently, it makes sense to consider factorial ANOVA in relation to two themes introduced further above, namely, pre-study estimation of (1) expected statistical power and (2) the risk of M error. ...
... For example, it is now known that certain types of traditional inferential statistical analysis -such as, classical fixed-effects ANOVA -are suboptimal for a range of highly relevant experimental designs, including ones in which multiple learners respond to the same linguistic items. For such designs SLA researchers increasingly use mixed-effects (ME) multiple regression (e.g., Baayen, 2008 ;Linck, & Cunnings, 2015 ). Two typical advantages of ME regression are better power than is afforded by a corresponding traditional procedure and optimized estimation of true effects through multi-level averaging, or "shrinkage " (e.g., Greenland, 2000 ;McElreath, 2020 ). ...
A classical prospective power analysis estimates the chance of obtaining a statistically significant result. However, it does so with no regard to the reliability of the result. “Design analysis” is a complementary component of study planning which addresses that limitation. Monte Carlo simulations and innovative freeware were used to provide illustrations of common, potentially grave problems that make design analysis necessary. Five statements outline widely known background to those problems. (1) A regime of significance testing tends to engender publication bias. (2) Small-sample studies commonly have very low expected statistical power. (3) Many SLA (quasi-)experimental studies have used small samples. (4) A combination of publication bias and low average power seeds a research literature with findings that well-powered replication studies do not repeat. (5) Published estimates of true effects are often too high. As to the last statement, many SLA researchers may be unaware of the mathematical basis of the tendency for obtained significant estimates to be too high whenever expected power is not high, which it frequently is not. Indeed, if power is too low, a significant result can only be misleading: Any good estimate will be nonsignificant. The design analysis procedures used to illustrate the focal problems can also be used to estimate sample sizes required for adequate expected power along with good control of the risk of obtaining very misleading significant results.
... After all, there are numerous circumstances in which the residual variance terms are difficult to interpret. For instance, difficulties arise when the other random effects variances differ in terms of how well they can be estimated: In an LMM, hierarchical shrinkage is imposed over the individual-level deviations, which in turn reduces the variance estimates (e.g., Baayen, 2008). This means that if the data is relatively sparse, and therefore shrinkage is considerable, then the residual variance will not exclusively capture within-participant trial-by-trial variability. ...
... In the simulated data used by vDAHSW, all random-effects terms are by-participant terms, which makes the interpretation of the residual variance term clear. However, one of the main reasons for employing LMMs is their ability to simultaneously account for multiple sources of stochasticity using so-called crossed random effects (Baayen, 2008;Judd et al., 2012Judd et al., , 2017. In a model with crossed random effects, the residual variance term reflects all the variability that is not accounted for by the other random-effects terms. ...
Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxical—in essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination, which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization, which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios.
... Including random slopes allows the model to capture this variability in individual subject's responses to the variable 'order'. While regressions-out are reported with odds ratios, standard errors, and p-values, the results for total times also include coefficients, standard errors, and t-values for each fixed effect and interaction (see Baayen et al., 2008 for absolute t-value in linear-mixed effect models). Data, scripts, and stimuli for all experiments are available at https://osf.io/ubxkr/?view_only=01bc62fb4fd0460f9e7831b0eedfbe4e. ...
German has two demonstrative pronouns: the der, die, das paradigm and the dieser, diese, dies(es) paradigm. Previous studies mainly compared the anaphoric use of der with the personal pronoun er and observed that der refers to less prominent antecedents. However, there are only very few studies that have investigated the differences between these two demonstrative pronouns. We hypothesize that they differ in signaling topic persistence and in accessing contrastive antecedents. We tested these hypotheses in short texts that manipulated the contrast of the antecedent by inducing the expression ‘in contrast to’ vs. ‘together with’ (e.g., the cellist in contrast to the flautist vs. the cellist together with the flautist). Results from our eye-tracking reading Experiment (Experiment 1), in which participants’ eye- movements were monitored while reading sentences, show that (i) readers preferred dieser when referring to the topic of a sentence, and (ii) dieser caused less processing difficulties than der in both contrast and no-contrast contexts. Our sentence completion Experiment (Experiment 2) also confirmed that der and dieser are both used for anaphoric reference to a topical antecedent. Collectively, our experiments provide evidence that dieser functions as inducing topic persistence. These results suggest that there is a need for further experimental investigation into the semantic factors and informational structures influencing the usage of demonstrative pronouns in German.
... Les données ont été analysées au moyen de modèles de régression linéaire à effets mixtes La signification des effets principaux et de l'interaction a été évaluée en utilisant l'approximation de Satterthwaite calculée dans le package lmerTest. Conformément à Baayen (2008) et, afin de garantir que les résultats de notre modèle n'étaient pas influencés par quelques données atypiques, les données dont les résidus étaient supérieurs à 2.5 fois l'écart type ont été considérées comme des valeurs aberrantes et ont été supprimées. Bien que les analyses aient été effectuées sur les TR transformés en logarithme naturel, par souci de clarté, nous présentons dans la section suivante les résultats (i.e., graphes, moyennes, écarts type) des TR non transformés. ...
De nombreuses études ont montré que les stéréotypes de genre jouent un rôle important dans la compréhension du langage, notamment lors de tâches de reconnaissance ou de jugements lexicaux présentés à la suite d’amorces stéréotypées, par exemple sous forme de noms de métiers ou de rôles (p.ex. Cacciari . Padovani, 2007 ; Gygax et al., 2008). Notre étude fait un pas supplémentaire dans l’évaluation du rôle des stéréotypes de genre en testant leur activation en l’absence d’amorce stéréotypée. Dans une tâche de décision lexicale, les participant·es ont dû décider, après avoir lu une amorce reliée sémantiquement au mot-cible mais non stéréotypée (par exemple fusée et astronaute), de l’existence ou non de noms de métiers stéréotypés dont la forme est épicène, tantôt avec une correspondance entre le déterminant les précédents et le stéréotype (un astronaute), tantôt avec une incongruence entre ces deux éléments (une astronaute). Les résultats indiquent que les temps de réaction sont plus longs lorsque le déterminant n’est pas congruent avec le stéréotype du mot qu’il précède. Notre étude est ainsi l’une des premières à notre connaissance à montrer une influence des stéréotypes de genre sur l’accès au lexique en français en l’absence d’`amorce stéréotypée.
... For a detailed presentation of regression with nominal outcomes, see Agresti (2002), Harrell (2015), and Hosmer et al. (2013). With respect to linguistic data specifically, see Baayen (2008), Speelman (2014), and Speelman et al. (2018). ...
This study considers an approach to alternations in which constructions are understood as non-binary choices between non-discrete usage patterns. To these ends, it seeks to develop usage-based methods for the identification and description of constructions without presupposing their level of formal granularity. Instead of deciding a priori what level of granularity is best for making generalizations about grammatical structure, the study aims to integrate the dimension of taxonomic variation into the analysis by treating constructions as combinatory emergent patterns, rather than predetermined discrete objects. Using the behavioural profile approach, we examine a 12-way lexico-constructional choice in Polish arising from the combinatory possibilities of three paradigmatic relations: grammatical aspect (perfective vs. imperfective); grammatical prefix ( wy- , za- , na- ); and predicate choice from the semantic frame of “stuff-fill” (- pchać /- pychać ‘push’, - pełnić /- pełniać ‘fill’). We analyse the combinations in a sample of 765 examples extracted from the National Corpus of Polish. The results reveal patterns in the use of the prefix-aspect-verb composites, interpretable as speaker choice, and show how those combinatory patterns can be accounted for without the need for positing discrete alternations. Furthermore, although only exploratory, such results call into question the descriptive validity of the traditional grammatical alternation.
... Single-item accuracy was predicted by length, lexical frequency, orthographic neighborhood, bigram frequencies by position, age of acquisition, concreteness, number of senses, semantic neighborhood density, number of syllables, number of phonemes and phonological neighborhood as fixed effects, with by-item and by-subject random intercepts as random effects. |z| values beyond 1.96 were deemed as significant [117]. Bigram frequencies by position and number of senses were logarithmically transformed to normalize these variables. ...
Background
Irregular word reading has been used to estimate premorbid intelligence in Alzheimer’s disease (AD) dementia. However, reading models highlight the core influence of semantic abilities on irregular word reading, which shows early decline in AD. The primary objective of this study is to ascertain whether irregular word reading serves as an indicator of cognitive and semantic decline in AD, potentially discouraging its use as a marker for premorbid intellectual abilities.
Method
Six hundred eighty-one healthy controls (HC), 104 subjective cognitive decline, 290 early and 589 late mild cognitive impairment (EMCI, LMCI) and 348 AD participants from the Alzheimer’s Disease Neuroimaging Initiative were included. Irregular word reading was assessed with the American National Adult Reading Test (AmNART). Multiple linear regressions were conducted predicting AmNART score using diagnostic category, general cognitive impairment and semantic tests. A generalized logistic mixed-effects model predicted correct reading using extracted psycholinguistic characteristics of each AmNART words. Deformation-based morphometry was used to assess the relationship between AmNART scores and voxel-wise brain volumes, as well as with the volume of a region of interest placed in the left anterior temporal lobe (ATL), a region implicated in semantic memory.
Results
EMCI, LMCI and AD patients made significantly more errors in reading irregular words compared to HC, and AD patients made more errors than all other groups. Across the AD continuum, as well as within each diagnostic group, irregular word reading was significantly correlated to measures of general cognitive impairment / dementia severity. Neuropsychological tests of lexicosemantics were moderately correlated to irregular word reading whilst executive functioning and episodic memory were respectively weakly and not correlated. Age of acquisition, a primarily semantic variable, had a strong effect on irregular word reading accuracy whilst none of the phonological variables significantly contributed. Neuroimaging analyses pointed to bilateral hippocampal and left ATL volume loss as the main contributors to decreased irregular word reading performances.
Conclusions
While the AmNART may be appropriate to measure premorbid intellectual abilities in cognitively unimpaired individuals, our results suggest that it captures current semantic decline in MCI and AD patients and may therefore underestimate premorbid intelligence. On the other hand, irregular word reading tests might be clinically useful to detect semantic impairments in individuals on the AD continuum.
... Models at least included random intercepts for both subject and item. We also included by-subject random slopes for all fixed effects as well as their interaction (Baayen 2008 andKush et al. 2019), unless the model failed to converge, in which case we removed the by-subject random slope for the interaction. ...
Ross (1967) observed that the coordinate structure constraint can be violated in certain semantically asymmetric structures. In this article we consider one of these structures, namely type A coordination, in detail (the terminology is from Lakoff 1986; an example is Here's the whisky I went to the store and bought ). We present experimental evidence showing that the pattern of argument and adjunct extraction from type A coordinate structures matches the pattern of argument and adjunct extraction from structures containing rationale clauses in all crucial respects. This near-perfect parallel behavior suggests that, like rationale clauses, the second conjunct in a type A coordination is an adjunct (see also Brown 2017). We explore the consequences of this finding for both interpretive and syntactic analyses of asymmetric coordination.
... In order to analyze the reading times, we conducted linear mixedeffects models using R (R Core Team, 2020). Models were built by using Baayen's approach (Baayen, 2008), meaning that fixed effects were added separately to the model in a stepwise manner. Each time a fixed effect was added, we conducted a log-likelihood test using the anova() function of the {base R} package (R Core Team, 2020) in order to compare the new model to the previous one that did not include it. ...
When learning a second language, some connectives are more difficult to acquire and to master than others. While previous research has assessed different factors responsible for this difficulty by using offline tasks, little is known about the extent to which L2-readers are sensitive to different connectives while reading . In our study, we compared self-paced reading times of native and non-native speakers of French for sentences that were correctly or incorrectly marked with either the infrequent connective cependant (‘however’) or the more frequent mais (‘but’). Results showed that incorrect uses only produced longer reading times when mais was used. Yet, in a sentence-evaluation-task using the same set of sentences, L2-speakers were able to discriminate incorrectly marked sentences with cependant from correctly marked ones. We conclude that a good theoretical understanding of connectives for L2 (Experiment 2) does not always warrant a quick activation of their meaning while reading (Experiment 1).
... Markov Chain Monte Carlo methods were used to compute the 95% confidence intervals for the effects and their significance at the .05 level [11]. Table 1 shows the effects of PlaceArtic. ...
... This resulted in the exclusion of 1.92% of data for dwell time, 1.85% for the first run dwell time, and 1.3% of the regression path duration. A statistical significance of .05 was indicated by values of [t or z] > 1.96 (Baayen 2008). For the sake of brevity, only significant effects are reported in the text. ...
Previous research has shown that word length, frequency and word repetition influence word reading times (Rayner 1998; 2009). Guidelines for Easy Language advise writers to use frequent and short words, and to repeat words instead of using synonyms. However, some of these guidelines are based on research that has been misinterpreted, simplified, or is outdated (Wengelin 2015), and studies focusing on effects of word length, frequency and word repetition among adult readers in the Easy Swedish target group are lacking. This eye-tracking study investigated the reading of Easy Language texts written by public authorities, as well as the effects of word length, frequency, and word repetition on readers in a day centre for people with intellectual disabilities. The results showed significant effects for word length and frequency in all readers. In addition, the effects were significantly greater in the target group than in the control group. The effects for word repetition were not as clear, affecting only one of the reading measures. Furthermore, the study revealed poor comprehension rates in the target group, i.e., when asked, they were not able to reproduce the main contents of the texts. The significantly greater effects of word length and frequency suggest that the related Easy Language guidelines are valid for this group of readers. The poor comprehension rates indicate that the texts were too difficult for these readers.
... For a detailed presentation of regression with nominal outcomes, see Agresti (2002), Harrell (2015), and Hosmer et al. (2013). With respect to linguistic data specifically, see Baayen (2008), Speelman (2014), and Speelman et al. (2018). ...
Depending on the theory of language employed, the paradigmatic and lexical variation associated with a given composite form-meaning pair is treated in different ways. First, variation can be treated as independent of the constructional semantics, an approach typical of modular theories. Second, paradigmatic variation can be considered indicative of constructional semantics; its variation constituting networks of closely related families of constructions. This is a common approach in construction grammar. Third, there exists a trend in cognitive linguistics and construction grammar to treat grammatical constructions as non-discrete emergent clusters of many-to-many form-meaning mappings. This study explores the possibility of extending current methods for quantitatively modelling construction grammar to an approach that does not assume discrete grammatical constructions. The speaker choice examined consists of the English future constructions will and BE going to and their use in contemporary informal British English. The constructions are examined with the behavioural profile approach. Three different regression modelling methods are applied to the grammatical alternations, each operationalizing one of the theoretical assumptions. While the results show that all three approaches are feasible and comparable in predictive accuracy, model interpretation becomes increasingly difficult with added complexity.
... To test our hypothesis, the data was analyzed using Generalized Linear Mixed Models (GLMMs; [67]) with binomial error structure and logit link function implemented in R Software version 4.0.3 and R studio version 1.4.1106. In each model, emotion recognition accuracy (as a binomial variable: correct vs. false) was the outcome variable, session number served as the control variable, and subject ID as the random intercept. ...
Person-related variation has been identified in many socio-cognitive domains, and there is evidence for links between certain personality traits and individual emotion recognition. Some studies, utilizing the menstrual cycle as a hormonal model, attempted to demonstrate that hormonal fluctuations could predict variations in emotion recognition, but with merely inconsistent findings. Remarkably, the interplay between hormone fluctuations and other person-related factors that could potentially influence emotion recognition remains understudied. In the current study, we examined if the interactions of emotion-related personality traits, namely openness, extraversion, and neuroticism, and the ovulatory cycle predict individual variation in facial emotion recognition in healthy naturally cycling women. We collected salivary ovarian hormones measures from N = 129 (n = 72 validated via LH test) women across their late follicular and mid-luteal phases of the ovulatory cycle. The results revealed a negative association between neuroticism scores and emotion recognition when progesterone levels (within-subject) were elevated. However, the results did not indicate a significant moderating influence of neuroticism, openness, and extraversion on emotion recognition across phases (late follicular vs. mid-luteal) of the menstrual cycle. Additionally, there was no significant interaction between openness or extraversion and ovarian hormone levels in predicting facial emotion recognition. The current study suggests future lines of research to compare these findings in a clinical setting, as both neuroticism and ovarian hormone dysregulation are associated with some psychiatric disorders such as premenstrual dysphoric disorder (PMDD).
... We argue that it is not obvious how we should weigh such disadvantages and advantages in LLMs' language learning. LLMs make wrong assumptions about language acquisition in key respects, but all scientific models do this (Box, 1979;Baayen, 2008), while this does not render such models useless: they can provide a lower bound on what linguistic phenomena are learnable in principle from distributional information. ...
... Estimates (β), standard errors (SE), and t-statistics are reported for all GCAs. The significance of effects was determined by assessing whether the associated t-statistics had absolute values of � 2 [76,77]. ...
This study examines the phonological co-activation of a task-irrelevant language variety in mono- and bivarietal speakers of German with and without simultaneous interpreting (SI) experience during German comprehension and production. Assuming that language varieties in bivarietal speakers are co-activated analogously to the co-activation observed in bilinguals, the hypothesis was tested in the Visual World paradigm. Bivarietalism and SI experience were expected to affect co-activation, as bivarietalism requires communication-context based language-variety selection, while SI hinges on concurrent comprehension and production in two languages; task type was not expected to affect co-activation as previous evidence suggests the phenomenon occurs during comprehension and production. Sixty-four native speakers of German participated in an eye-tracking study and completed a comprehension and a production task. Half of the participants were trained interpreters and half of each sub-group were also speakers of Swiss German (i.e., bivarietal speakers). For comprehension, a growth-curve analysis of fixation proportions on phonological competitors revealed cross-variety co-activation, corroborating the hypothesis that co-activation in bivarietals’ minds bears similar traits to language co-activation in multilingual minds. Conversely, co-activation differences were not attributable to SI experience, but rather to differences in language-variety use. Contrary to expectations, no evidence for phonological competition was found for either same- nor cross-variety competitors in either production task (interpreting- and word-naming variety). While phonological co-activation during production cannot be excluded based on our data, exploring the effects of additional demands involved in a production task hinging on a language-transfer component (oral translation from English to Standard German) merit further exploration in the light of a more nuanced understanding of the complexity of the SI task.
... We excluded trials with incorrect responses and results exceeding ±2.5 S.D. from the average (approximately 8.4% of the data) (Baayen 2008). We applied this exclusion procedure to the behavioral data. ...
... To assess the effects of the methodological design on the counts of the species, we used a generalized linear mixed modeling approach (GLMM; McCullagh and Nelder, 1989;Baayen, 2008;Bolker et al., 2009). Because at this stage our main intent was to highlight differences between designs rather than between methods, we did not compare results between coverboards and VES, but only within each method. ...
Monitoring of wildlife populations is essential for their conservation and requires a carefully chosen methodology. We compared survey effectiveness of reptiles using coverboards and visual encounter surveys in two study sites in the Italian Alps with similar habitats and reptile communities. The two sites shared similar methodologies, cover boards and visual encounter surveys (VES), except for the temporal approach, with one employing a long-lasting monitoring scheme and the other operating on a much shorter time-frame. Coverboards were placed two years before the beginning of the monitoring in the first site, while they were installed only for ten days and then removed each year in the second site. Similarly, VES were spread across the whole reptile activity season (May-September) in the first site, while conducted over nine consecutive days in the second site. Although the observation rate of any species was mainly associated with its relative abundance, reptiles preferred long-established coverboards and all three species present (Zootoca vivipara, Anguis veronensis and Vipera berus) were found underneath them. Only Zootoca vivipara used recently installed ones. On the other hand, short-term daily visual encounter surveys led to a much higher observation rate of Z. vivipara than those spread over the entire season. Our results suggest that coverboards may provide a valuable monitoring tool for reptiles when projects are conducted over long periods. Conversely, when only short-term assessments are possible, no real difference exists between the two methods and observation rate is more influenced by the species abundance than by the chosen method.
... All continuous factors were centred and scaled before entering in the models, and collinearity was checked by means of the condition number k before running the regression analyses (Belsley et al., 1980). The result indicated no potentially harmful collinearity (k = 2.36 < 30; Baayen, 2008). ...
The present study aimed to examine whether Mandarin-speaking children on the autism spectrum showed differences in comprehending spatial demonstratives (“this” and “that”, and “here” and “there”), as compared to typically developing (TD) children. Another aim of this study was to investigate the roles of theory of mind (ToM) and executive functions (EF) in the comprehension of spatial demonstratives. Twenty-seven autistic children (mean age 6.86) and 27 receptive-vocabulary-matched TD children (mean age 5.82) were recruited. Demonstrative comprehension was assessed based on participants’ ability to place objects in certain locations according to experimenters’ instructions which involved these demonstratives in three different conditions (same-, opposite-, and spectator-perspective conditions). Four false-belief tasks were administered to measure ToM, and the word-span task and the dimensional change card sort task were used to measure two subcomponents of EF – working memory and mental flexibility – respectively. Children on the autism spectrum were found to score below TD children in the comprehension of spatial demonstratives. In addition, the results showed that ToM and working memory were conducive to the correct interpretation of spatial demonstratives. The two cognitive abilities mutually influenced their respective roles in spatial demonstrative comprehension in the three different conditions. The findings suggest that the comprehension of spatial demonstratives comprehension is an area of need in Mandarin-speaking children on the autism spectrum, and it might be linked to their differences in cognitive abilities.
... Mixed-effects logistic regression analyses (Baayen, 2008) were carried out in R version 4.0.0 (R Core Team, 2020), using the function glmer from the lme4 package (Bates et al., 2015). Guided by the above hypotheses, the models were fitted with the predictors (fixed effects) of Age (four levels: four, six, eight, and ten years) and Language Acquisition (English monolingual, English bilingual, French monolingual, and French bilingual). 2 All categorical variables were coded as dichotomous dummy variables (1 vs. 0). ...
Previous research on the L1 acquisition of motion event expression suggests that mapping multiple semantic components onto syntactic units is associated with greater difficulties in verb-framed than in satellite-framed languages, because the former require more complex structures (using subordination). This study investigated the impact of this language-specific difference in English-French bilingual children's caused motion expressions. 2L1 children (n = 96) between 4 and 10 years and monolingual English and French children (n = 96) described video animations portraying caused motion events involving multiple semantic components. Results revealed reduced rates of subordinate constructions in bilinguals' French descriptions, and more so in older than younger children, while English responses aligned with monolinguals. Semantic density of responses strongly predicted syntactic complexity, but exclusively in French. These asymmetric findings indicate a task-specific syntactic relief strategy and are discussed in the context of theoretical claims about universal biases of event encoding and bilingual-specific optimisation strategies.
... For statistics, an LMM was fitted with "Group" and "Russian sound" as the fixed effects, "Subject" as the random intercept; K′ values were calculated as the dependent variable. Nine outliers with large-scaled residuals (over 2.5 SD) were excluded via model-based trimming (Baayen 2008). The visual inspection of Q-Q plots and plots of residuals revealed no obvious deviations. ...
This study explored the perceptual assimilation and discrimination of Russian phonemes by three groups of Chinese listeners with differing Russian learning experience. A perceptual assimilation task (PAT) and a perceptual discrimination test (PDT) were conducted to investigate if/how L1-L2 perceptual similarity would vary as a function of increased learning experience, and the development of assimilation-discrimination relations. The PAT was analyzed via assimilation rates, dispersion K' values, goodness ratings and assimilation patterns. Results revealed an intriguing phenomenon that the perceived Mandarin-Russian similarity first increased from naïve listeners to intermediate learners and then decreased slightly in relatively advanced learners. This suggests that L1-L2 perceptual similarity is subject to learning experience and could follow a potential "rise and fall" developmental pattern. The PDT results were mostly in line with the assimilation-discrimination correspondence with more experience bringing out better discriminability in general. Yet the overall sensitivity d' values from the Chinese groups were relatively low, implying acoustic/articulatory effects on L2 discriminability aside from perceptual assimilation. The results were discussed under the frameworks of L2 Perceptual Assimilation Model, Speech Learning Model and L2 Linguistic Perception Model.
... Since the groups of children were not equal (10 children with CI and 30 children with NH) and the number of extracted words varied for each child, the MSP was normalized by implementing a bootstrapping procedure (Baayen, 2008;Molemans, Van den Berg, Van Severen, & Gillis, 2012). The bootstrapping procedure was used to establish the entropy of each lemma for each child at each month from word birth. ...
The inflectional diversity of parents’ speech directed to children acquiring Dutch was investigated. Inflectional diversity is defined as the number of inflected forms of a particular lemma (e.g. singular, plural of a noun) and measured by means of Mean Size of Paradigm (MSP). Changes in the inflectional diversity of infant directed speech (IDS) were analyzed as a function of children’s developing linguistic abilities. Two types of changes in the inflectional diversity of nouns and verbs were analyzed: (1) coarse tuning: changes relative to children’s growing vocabulary and (2) fine lexical tuning: changes relative to children’s use of specific lexical items. In addition, it was investigated if those changes were similar depending on particular characteristics of the children, namely, differences in their hearing abilities. Longitudinal recordings of spontaneous speech of 30 children (0;6-2;0) with normal hearing (NH) and 10 hearing-impaired children with a cochlear implant (CI) (0;6-2;6), and their parents were analyzed. As to coarse tuning, it was found that the inflectional diversity of IDS decreased at the beginning of the child’s lexical development but increased again parallel to infants’ growing cumulative vocabulary. As to fine lexical tuning, IDS showed less inflectional diversity before each child’s first use of a word and gradually more inflectional diversity afterward. In addition, parents of children with CI used less inflectionally diverse speech than parents of children with NH, which suggests an adaptation to specific characteristics of the children. In conclusion, inflectional morphology in IDS appears to be tuned to children’s hearing status and linguistic knowledge.
... Data were collected for 4060 observations, but an initial phase of outlier trimming removed invalid trials. Baayen (2008) explains that extremely fast response times (RTs) signify non-engaged, automatic button-pushing, and extremely slow responses signify confusion or distraction. Accordingly, 80 observations (1.97% of the data) were removed with response times below 200 ms or above 4000 ms. ...
This research investigated two aspects of second language learning: how implicit knowledge develops through explicit learning and how this is affected by multiword expression compositionality. More specifically, the experiment investigated how flashcard learning affected the implicit knowledge development of literal and figurative expressions. As these two types are composed differently, it was hypothesized that their implicit knowledge development would likewise differ. A lexical decision task was conducted in a masked repetition priming experiment to measure implicit knowledge gains, and response time data were analyzed in a linear mixed-effects model with participants and items set as random effects. Results showed that flashcard learning affected the implicit knowledge development of figurative and literal expressions differently. Keywords: explicit learning; flashcards; implicit knowledge; interface; multiword expressions 本研究では、第二言語学習の2つの側面である、複単語表現の構成性と、明示的学習を通じて暗示的知識がどのように発達するかについて調査した。具体的には、フラッシュカードによる学習が、文字通りの表現と比喩表現の暗示的知識の発達にどのような影響を与えるかを調査した。この2つの表現は構成が異なるため、暗示的知識の発達も同様に異なるという仮説を立てた。暗示的知識の獲得を測定するために、マスク下の反復プライミング法を用いた実験で、語彙性判断課題を実施し、応答時間データを、参加者と項目をランダム効果として設定した線形混合効果モデルで分析した。その結果、フラッシュカードによる学習は、比喩表現と文字通りの表現の暗示的知識の発達に異なる影響を与えることが示された。 キーワード: フラッシュカード、明示的学習、暗示的知識、複単語表現
... I follow Baayen (2008) in subscribing to a modern type of exploratory data analysis. In this approach we allow for the possibility that not all of the patterns in the data can be explained by an a priori formulated theory. ...
This book offers a comparative perspective on the structural and interpretive properties of root-clause complementizers in Ibero-Romance. The driving question the author seeks to answer is where the boundaries between syntax and pragmatics lie in these languages. Contrary to most previous work on these phenomena, the author argues in favor of a relatively strict distribution of labor between the two components of grammar. The first part of the book is devoted to root complementizers with a reportative interpretation. The second part deals with root complementizers and commitment attribution. Finally, the last part presents the results of empirical studies on the topic.
... The predictors included in the models as fixed effects were the ( Final models were stepped down from this full model, and through analysis and comparison of possible models and the use of model criticism (Baayen 2008), we reached the models that most accurately predict the variation in the data given the predictors included in this study. ...
This chapter reports on an indexicality study of standard and regional forms of the past participle of strong verbs in western Denmark (Jutland). The last decades have seen a strong standardisation process with respect to language use, and we test the hypothesis that this is driven by the social meanings associated with this morphological variation. We conducted an online matched guise experiment with controlled variation of the participle suffix using recordings with speakers from Eastern Jutland. 262 respondents evaluated the speakers on scales related to personality traits, social background and standardness. The results show that the most regional variant leads to a lower score in perceived standardness but, for younger respondents, a higher score in how “nice” the speaker seems.
... In analysis one, we tested whether distributor, partner presence and/or their interaction had an effect on monkeys' food refusal behaviour during test conditions. To this end we constructed Model 1A, a generalized linear mixed model (GLMM; [35]; output in tables 1 and 2). The response variable was food refusal (yes/no), and hence the model was fitted with binomial error structure and logit link function [36]. ...
Protest in response to unequal reward distribution is thought to have played a central role in the evolution of human cooperation. Some animals refuse food and become demotivated when rewarded more poorly than a conspecific, and this has been taken as evidence that non-human animals, like humans, protest in the face of inequity. An alternative explanation—social disappointment—shifts the cause of this discontent away from the unequal reward, to the human experimenter who could—but elects not to—treat the subject well. This study investigates whether social disappointment could explain frustration behaviour in long-tailed macaques, Macaca fascicularis. We tested 12 monkeys in a novel ‘inequity aversion’ paradigm. Subjects had to pull a lever and were rewarded with low-value food; in half of the trials, a partner worked alongside the subjects receiving high-value food. Rewards were distributed either by a human or a machine. In line with the social disappointment hypothesis, monkeys rewarded by the human refused food more often than monkeys rewarded by the machine. Our study extends previous findings in chimpanzees and suggests that social disappointment plus social facilitation or food competition effects drive food refusal patterns.
... Only RTs associated with a correct answer were considered for analyses (83% of the trials). Outliers were removed via model criticism (2.5 SD of standardized residuals: Baayen, 2008;Ch. 6.2.3). ...
Object relatives are more difficult to process than subject relatives. Several sentence processing models have been proposed to explain this difference. As double-center embedding relatives contain several long-distance dependencies, they are an ideal configuration to compare sentence processing models. The main aim of the present study was to compare the predictions of the featural Relativized Minimality approach with the ones of other relevant sentence processing models.
57 Italian-speaking healthy adults answered comprehension questions concerning the first, second, or third verb to appear in both double-center embedding and control sentences. Results show that questions concerning the matrix verb of double-center embedding structures were significantly easier and were associated with faster response times than questions concerning the embedded verbs. Furthermore, in object double-center embedding relatives the questions concerning the verb of the most embedded clause were easier than the ones concerning the verb of the intermediate embedded clause.
This pattern of results is consistent with featural Relativized Minimality but cannot be fully explained by other sentence processing models.
... Again, we used the AIC to determine whether adding a random slope for Voicing for both random intercepts improved the model. We followed the model criticism procedure by Baayen (2008) and assessed our models on autocorrelation, multicollinearity, normality of the residuals, and heteroscedasticity using the car package version 3.0.12 (Fox & Weisberg, Fox and Weisberg, 2019). ...
Introduction
: Surgical treatment of oral cancer leads to lasting changes of the vocal tract and individuals treated for oral cancer (ITFOC) often experience speech problems. The purpose of this study was to analyse the acoustic properties of the spontaneous speech of individuals who were surgically treated for oral cancer. It was investigated (1) how key spectral measures of articulation change post-treatment; (2) whether changes are more related to target manner or place of articulation; and (3) how spectral measures develop at various time points following treatment.
Method
: A corpus consisting of 32.850 tokens was constructed by manually segmenting the speech of five (four female - one male) American English speaking ITFOC. General acoustic characteristics (duration and spectral tilt), plosives (burst frequency), fricatives (centre of gravity and spectral skewness), and vowels (F1 and F2) were analysed using linear mixed effects regression and compared to control speech. Moreover, a within speaker analysis was performed for speakers with multiple recordings.
Results
: Manner of articulation is more predictive of post-treatment changes than place of articulation. Compared to controls, ITFOC produced the fricatives /f, v, θ, ð, s, z, ʃ, ʒ/ with a lower centre of gravity while no differences were found in plosives and vowels. Longitudinal analyses show high within-speaker variation, but general improvements one-year post-treatment.
Conclusions
: Surgical oral cancer treatment changes the spectral properties of speech. Fricatives with varying manner of articulations were distorted, suggesting that manner of articulation is more predictive than place of articulation in identifying general problem areas for ITFOC.
... A Fisher's Exact Test using R code programming language plotted the multidimensional scale (MDS); the test is calibrated to measure small samples (McDonald, 2014;Baayen, 2008). ...
This study examined the morphosyntactic functions of Waray substantive lexical items and
sought to answer whether they are categorized, precategorial, or variable; using a corpus
under the lens of Basic Linguistic Theory (Dixon, 2010).
The first step involved a review of the weaknesses of the absolute category and precategorial
positions. A presentation of data on the validity of variability position using induction by
simple enumeration (qualitative evidence) supported the review.
Next, the presentation of quantitative data established the variability of Waray substantive
lexical items. The data consisting of Waray roots underwent a pilot test and adjustments to
determine the final data pool of Waray roots. Three independent auditors conducted data
validation. Statisticians plotted the data on a multidimensional scale (MDS) and triangulated
the results. The researcher analyzed and interpreted the data. The study shows that Waray
roots are variable; however, they can be classified, with or without affixes, based on their
referential, predicative, or modificative functions observed in their actual usage in the
corpus. The results entail a new scheme in the organization of word classes first articulated
by Dixon (2010).
This study proposes a new model for tagging of Waray roots, inflected forms, and those
with stem-forming affixes doing away with the traditional part-of-speech tag such as noun
(n.), verb (v.), adverb (adv.), and adjective (adj.).
Keywords: Waray, morphosyntax, morphosyntactic functions, corpus study, lexical items,
lexicography
... For both the ''early-life model'' and ''current maternal model'', we used Generalized Linear Mixed Models (GLMMs (Baayen, 2008)) with negative binomial error structure and logit link function (see SOM for further detail on model choice) using the function ''glmer'' of the R package lme4 (version 1.121; (Bates et al., 2015) with the optimizer set to ''bobyqa''). We confirmed that this error distribution was the best fit to our data using simulations (see SOM). ...
Early-life experiences, such as maternal care received, influence adult social integration and survival. We examine what changes to social behaviour through ontogeny lead to these lifelong effects, particularly whether early-life maternal environment impacts the development of social communication. Chimpanzees experience prolonged social communication development. Focusing on a central communicative trait, the ‘pant-hoot’ contact call used to solicit social engagement, we collected cross-sectional data on wild chimpanzees (52 immatures and 36 mothers). We assessed early-life socioecological impacts on pant-hoot rates across development, specifically: mothers’ gregariousness, age, pant-hoot rates and dominance rank, maternal loss and food availability, controlling for current maternal effects. We found that early-life maternal gregariousness correlated positively with offspring pant-hoot rates, whilst maternal loss led to reduced pant-hoot rates across development. Males had steeper developmental trajectories in pant-hoot rates than females. We demonstrate the impact of maternal effects on developmental trajectories of a rarely investigated social trait, vocal production.
... We excluded all the erroneous trials (2.15%), and trials with naming latencies shorter than 200 ms and longer than 2000 ms or beyond three standard deviations from the mean naming latencies of each condition for each participant (1.79%) (see a similar approach in Qu et al., 2020). The remaining naming latencies were log-transformed to reduce right skewness (Baayen, 2008) and then analyzed by the linear mixed-effect modeling (LMM) using lme4 package (Bates et al., 2015b) in R (R Core Team, 2021), with p-values estimated via the Satterthwaite approximation method implemented in the lmerTest package (Kuznetsova et al., 2017). Moreover, the raw naming latencies obtained from the behavioral data pre-processing were used to generate the speech onset markers, which were temporally aligned with the EEG signal for each participant. ...
Pronunciation of words or morphemes may vary systematically in different phonological contexts, but it remains unclear how different levels of phonological information are encoded in speech production. In this study, we investigated the online planning process of Mandarin Tone 3 (T3) sandhi, a case of phonological alternation whereby a low-dipping tone (T3) changes to a Tone 2 (T2)-like rising tone when followed by another T3. To examine the time course of the encoding of the abstract category-level (underlying form) and context-specific phonological form (surface form) of T3, we conducted an electroencephalographic (EEG) study with a phonologically-primed picture naming task and examined the event-related potentials (ERPs) time-locked to the stimulus onset as well as speech response onset. The behavioral results showed that targets primed by T3 or T2 primes yielded shorter naming latencies than those primed by control primes. Importantly, the EEG data revealed that T3 primes elicited larger positive amplitude over broad frontocentral regions roughly in the 320–550 ms time window of stimulus-locked ERP and −500 to −400 ms time window of response-locked ERP, whereas T2 primes elicited larger negative amplitude over left frontocentral regions roughly in the −240 to −100 ms time window of response-locked ERP. These results indicate that the underlying and the surface form are encoded at different processing stages. The former presumably occurs in the earlier phonological encoding stage, while the latter probably occurs in the later phonetic encoding or motor preparation stage. The current study offers important implications for understanding the processing of phonological alternations and tonal encoding in Chinese word production.
... The intuition is to start from a simple pairwise comparison of two levels of a bias variable (cf. the first and second column in Table 1) and add covariates to see whether the effect of the bias variable remains unaffected. This procedure has become standard in the last decade in neighboring fields like linguistics and psychology which have moved from significance tests (Student's t-test, analysis of variance) to the family of multivariate regression models (Bresnan et al., 2007;Baayen, 2008;Jaeger, 2008;Snijders and Bosker, 2012). Regression models estimate the relationships between the dependent (previously called observed) variable -in this case, system performance -and one or more independent variables -in this case, the putative bias variable and its covariates, each of which is assigned a direction and a significance. ...
In recent years, there has been an increasing awareness that many NLP systems incorporate biases of various types (e.g., regarding gender or race) which can have significant negative consequences. At the same time, the techniques used to statistically analyze such biases are still relatively simple. Typically, studies test for the presence of a significant difference between two levels of a single bias variable (e.g., male vs. female) without attention to potential confounders, and do not quantify the importance of the bias variable. This article proposes to analyze bias in the output of NLP systems using multivariate regression models. They provide a robust and more informative alternative which (a) generalizes to multiple bias variables, (b) can take covariates into account, (c) can be combined with measures of effect size to quantify the size of bias. Jointly, these effects contribute to a more robust statistical analysis of bias that can be used to diagnose system behavior and extract informative examples. We demonstrate the benefits of our method by analyzing a range of current NLP models on one regression and one classification tasks (emotion intensity prediction and coreference resolution, respectively).
Conservation of biodiversity requires in-depth knowledge of trait-environment interactions to understand the influence the environment has on species assemblages. Saproxylic beetles exhibit a wide range of traits and functions in the forest ecosystems. Understanding their responses to surrounding environment thus improves our capacity to identify habitats that should be restored or protected. We investigated potential interactions between ecological traits in saproxylic beetles (feeding guilds and habitat preferences) and environmental variables (deadwood, type and age of surrounding forest). We sampled beetles from 78 plots containing newly created high stumps of Scots pine and Silver birch in boreal forest landscapes in Sweden for three consecutive years. Using a model based approach, our aim was to explore potential interactions between ecological traits and the surrounding environment at close and distant scale (20 m and 500 m radius). We found that broadleaf-preferring beetle species are positively associated with the local broadleaf-originated deadwood and broadleaf-rich forests in the surrounding landscapes. Conifer-preferring species are positively associated with the local amount of coniferous deadwood and young and old forests in the surrounding landscape. Fungivorous and predatory beetles are positively associated with old forests in the surrounding landscapes. Our results indicate that both local amounts of deadwood and types of forests in the landscape are important in shaping saproxylic beetle communities. We particularly highlight the need to increase deadwood amounts of various qualities in the landscape, exempt older forests from production and to increase broadleaf-rich habitats in order to meet different beetle species' habitat requirements. Trait responses among saproxylic beetles provide insights into the significance of broadleaf forest and dead wood as essential attributes in boreal forest restoration, which helps conservation planning and management in forest landscapes.
The current study investigates the probabilistic conditioning of the Mandarin locative alternation. We adopt a corpus-based multivariate approach to analyze 2,836 observations of locative variants from a large Chinese corpus and annotated manually for various language-internal and language-external constraints. Multivariate modeling reveals that the Mandarin locative alternation is not only influenced by semantic predictors like affectedness and telicity, but also by previously unexplored syntactic and language-external constraints, such as complexity and animacy of locatum and location, accessibility of locatum, pronominality, definiteness of location, length ratio and register. Notably, the effects of affectedness, definiteness and pronominality are broadly parallel in both the Mandarin locative alternation and its English counterpart. We thus contribute to theorizing in corpus-based variationist linguistics by uncovering the probabilistic grammar of the locative alternation in Mandarin Chinese, and by identifying the constraints that may be universal across languages.
We investigate how three adult groups – experienced L2 English listeners; experienced D2 (second dialect) listeners; and native L1/D1 listeners – categorise Australian English (AusE) lax front vowels /ɪ e æ/ in /hVt/, /hVl/ and /mVl/ environments in a forced-choice categorisation task of synthesised continua. In study 1, AusE listeners show predictable categorisations, with an effect of coarticulation raising the vowel in perception for nasal onset stimuli, and a following lateral lowering the vowel in perception. In study 2, Irish (D2) and Chinese listeners (L2) have different categorisations than AusE listeners, likely guided by their D1/L1. Coarticulation influences the D1/D2 groups in similar ways, but results in more difficulty and less agreement for the Chinese. We also investigate the role of extralinguistic factors. For the Chinese listeners, higher proficiency in English does not correlate with more Australian-like categorisation behaviour. However, having fewer Chinese in their social network results in more Australian-like categorisation for some stimuli. These findings lend partial support to the role of experience and exposure in L2/D2 contexts, whereby categorisation is likely still driven by native categories, with increased exposure leading to better mapping, but not to a restructuring of underlying phonetic categories.
This paper presents a quantitative analysis of the representation and affective encoding of fictional space in a corpus of 125 Swiss literary prose texts of the 19th and early 20th Century written in German, offering a contribution to both spatial and affective literary studies.
Motivated by questions about the iconic dichotomy between ‘urban’ and ‘rural/natural’ space in literary works (Sengle; Fournier; Nell and Weiland) – and in Swiss literature around 1900 in particular (Rehm) – we use computational methods to detect and examine how different types of space are distributed and affectively encoded in German-Swiss literature. Taking into account the complexity of cultural perceptions and representations of space across history, we examine the presence of ‘urban’ and ‘rural/natural’ fictional spaces and their potential role in constructing a ‘Swiss’ national literature (Böhler; Zimmer), and their affective encoding.
In order to do this, we first compiled a comprehensive dictionary of named and non-named spatial entities in the broad spatial categories RURAL and URBAN, and examined the presence of sentiment and emotions (valence and discrete emotions) and their ‘strength’ (arousal) in relation to these. We used current state-of-the-art sentiment lexicons for German available to the digital humanities community. Similarly to Heuser et al., we mapped the spatial entities and the sentiment lexicons onto our corpus, and focused on spans of +/-50 words around the detected entities, in order to examine the specific sentiment and emotions related to space.
In an exploratory analysis, we offer here a first-time data-driven perspective on rural and urban fictional space, incorporating the dimension of affective encoding of space systematically.
Research on the vocal behaviour of non-human primates is often motivated by a desire to understand the origins of semantic communication, which led to a partial separation of this research from ecological-evolutionary approaches. To bridge this gap, we returned to the textbook example of semantic communication in animals, the vervet monkey, Chlorocebus pygerythrus, alarm call system, and investigated whether male alarm barks fulfil a dual function of alarming and indicating male quality. Barks are loud calls, produced by adult males, in response to large carnivores. However, since barks occasionally occur in agonistic interactions, we investigated whether barks may also indicate male quality. We recorded natural barking events over 23 months, sampling individual male participation from 45 individuals in six free-ranging groups at the Mawana Game Reserve, KwaZulu-Natal, South Africa. We hypothesized that barking frequency is under intra-sexual selection and predicted that barking frequency would increase with male rank and the degree of male-male competition. We found that the highest-ranking males were more likely to produce barks than lower-ranking males and that the number of daily barking events increased during the mating season. We advocate studying primate communication in its evolutionary context to achieve a comprehensive understanding of call 'meaning'.
Even after the 150th anniversary of sexual selection theory, the drivers and mechanisms of female sexual selection remain poorly studied. To understand demographic circumstances favoring female-female competition, trade-offs with kin selection and interactions with male reproductive strategies, we investigated female evictions in redfronted lemurs ( Eulemur rufifrons ). Based on 24 years of demographic data of known individuals, we show that female redfronted lemurs target close female kin for forcible, permanent, and presumably lethal eviction, even though groups contain multiple unrelated males whose voluntary emigration actually mitigated the probability of future female evictions. Female eviction and male emigration were predicted by group size, but male emigration was primarily driven by a proportional increase of male rivals. Female evictions were more likely than male emigrations when there were more juvenile females in a group, but the identity of evicted females was not predicted by any intrinsic traits. While birth rates were reduced by the number of juvenile females, they were higher when there were more adult females in a group and in years with more rainfall. Early infant survival was reduced with increasing numbers of juvenile females, but variation in female lifetime reproductive success was not related to any of the predictors examined here. Thus, there seems to be a limit on female group size in this lemur species. More generally, our study demonstrates a balanced interplay between female reproductive competition, competition over group membership between both sexes, and kin selection, contributing new insights into the causes and consequences of female competition in animal societies.
Significance statement
The evolutionary causes of female competition in vertebrate societies remain poorly known. Evictions represent an extreme form of female competition because even close kin are evicted when same-sized unrelated males are theoretically also available as victims. We studied drivers and consequences of evictions in redfronted lemurs ( Eulemur rufifrons ) using 24 years of demographic data from multiple groups. We show that while voluntary male emigration mitigates the probability of future female evictions, females nonetheless appear to accept the fitness costs of evicting female kin. While group size seems to be the main driver of departures by either sex, the number of juvenile females present in groups is the key variable triggering eviction events as well as physiological responses that could be interpreted as female reproductive restraint. Our study therefore revealed that competition does trump cooperation under some circumstances in the intricate interplay between sexual selection and kin selection on females.
This paper investigates the acoustic correlates of stress in European Portuguese. Using a nonce word experiment, this study controls the phonological environment of the stimuli so stressed and unstressed vowels with the same quality can be directly compared. Of the five acoustic measures examined, duration is the most robust correlate of stress, but the effect is limited to certain vowels and speakers. Care is taken to separate the effects of independent phonological processes on acoustic properties that are also influenced by stress.
Linguistic complexity is a complex phenomenon, as it manifests itself on different levels (complexity of texts to sentences to words to subword units), through different features (genres to syntax to semantics), and also via different tasks (language learning, translation training, specific needs of other kinds of audiences). Finally, the results of complexity analysis will differ for different languages, because of their typological properties, the cultural traditions associated with specific genres in these languages or just because of the properties of individual datasets used for analysis. This paper investigates these aspects of linguistic complexity through using artificial neural networks for predicting complexity and explaining the predictions. Neural networks optimise millions of parameters to produce empirically efficient prediction models while operating as a black box without determining which linguistic factors lead to a specific prediction. This paper shows how to link neural predictions of text difficulty to detectable properties of linguistic data, for example, to the frequency of conjunctions, discourse particles or subordinate clauses. The specific study concerns neural difficulty prediction models which have been trained to differentiate easier and more complex texts in different genres in English and Russian and have been probed for the linguistic properties which correlate with predictions. The study shows how the rate of nouns and the related complexity of noun phrases affect difficulty via statistical estimates of what the neural model predicts as easy and difficult texts. The study also analysed the interplay between difficulty and genres, as linguistic features often specialise for genres rather than for inherent difficulty, so that some associations between the features and difficulty are caused by differences in the relevant genres.
The bushmeat trade provides an income to hunters, transporters, and vendors living in the vicinity of protected areas but remains a challenge to wildlife conservation objectives. The key factors driving the source, choice and use of bushmeat vary among actors in the commercial bushmeat value chain, and insights into these determinants are required to facilitate the development of conservation strategies. Therefore, we aimed to identify the socioeconomic factors that explain the source of supply and quantities of bushmeat available in households and local restaurants. We carried out a survey with 144 rural household heads and 24 restaurant owners in 20 villages in the Western part of Taï National Park in Côte d’Ivoire. We found that bushmeat quantity and species diversity were low in households, originating mainly from subsistence hunting. However, both the amount of bushmeat and the variety of species were high in restaurants and primarily supplied by commercial hunters. Furthermore, the quantity of bushmeat was lower in households with other protein sources and in restaurants in villages that had been the target of more conservation awareness campaigns. We highlight the importance of understanding the determinants of bushmeat supply to regulate the bushmeat trade by applying relevant conservation interventions.
Introduction
To explore human-canid relationships, we tested similarly socialized and raised dogs (Canis familiaris) and wolves (Canis lupus) and their trainers in a wildlife park. The aims of our study were twofold: first, we aimed to test which factors influenced the relationships that the trainers formed with the dogs or wolves and second, we investigated if the animals reacted to the trainers in accordance with the trainers’ perceptions of their relationship.
Methods
To achieve these goals, we assessed the relationships using a human-animal bonds survey, which the trainers used to rate the bonds between themselves and their peers with the canids, and by observing dyadic trainer-canid social interactions.
Results
Our preliminary results given the small sample size and the set-up of the research center, demonstrate that our survey was a valid way to measure these bonds since trainers seem to perceive and agree on the strength of their bonds with the animals and that of their fellow trainers. Moreover, the strength of the bond as perceived by the trainers was mainly predicted by whether or not the trainer was a hand-raiser of the specific animal, but not by whether or not the animal was a wolf or a dog. In the interaction test, we found that male animals and animals the trainers felt more bonded to, spent more time in proximity of and in contact with the trainers; there was no difference based on species.
Discussion
These results support the hypothesis that wolves, similarly to dogs, can form close relationships with familiar humans when highly socialized (Canine Cooperation Hypothesis). Moreover, as in other studies, dogs showed more submissive behaviors than wolves and did so more with experienced than less experienced trainers. Our study suggests that humans and canines form differentiated bonds with each other that, if close, are independent of whether the animal is a wolf or dog.
Theoretical linguists have traditionally relied on linguistic intuitions such as gram- maticality judgments for their data. But the massive growth of computer-readable texts and recordings, the availability of cheaper, more powerful computers and soft- ware, and the development of new probabilistic models for language have now made the spontaneous use of language in natural settings a rich and easily accessible alter- native source of data.
Surprisingly, many linguists believe that such ‘usage data’ are irrelevant to the theory of grammar. Four problems are repeatedly brought up in the critiques of usage data—
1. correlated factors seeming to support reductive theories,
2. pooled data invalidating grammatical inference,
3. syntactic choices reducing to lexical biases, and
4. cross-corpus differences undermining corpus studies.
Presenting a case study of work on the English dative alternation, we show first, that linguistic intuitions of grammaticality are deeply flawed and seriously underes- timate the space of grammatical possibility, and second, that the four problems in the critique of usage data are empirical issues that can be resolved by using modern statistical theory and modelling strategies widely used in other fields.
The new models allow linguistic theory to solve more difficult problems than it has in the past, and to build convergent projects with psychology, computer science, and allied fields of cognitive science.
Among the most fascinating data for phonology are those showing how speakers incorporate new words and foreign words into their language system, since these data provide cues to the actual principles underlying language. In this article, we address how speakers deal with neutral- ized obstruents in new words. We formulate four hypotheses and test them on the basis of Dutch word-final obstruents, which are neutral for (voice). Our experiments show that speakers predict the characteristics ofneutralized segments on the basis ofphonologically similar morphemes stored in the mental lexicon. This effect of the similar morphemes can be modeled in several ways. We compare five models, among them STOCHASTIC OPTIMALITY THEORY and ANALOGICAL MODELING OF LANGUAGE; all perform approximately equally well, but they differ in their complex- ity, with analogical modeling oflanguage providing the most economical explanation.*
This study investigates whether regular morphological complex neologisms leave detectable traces in the mental lexicon. Experiment 1 (subjective frequency estimation) was a validation study for our materials. It revealed that semantic ambiguity led to a greater reduction of the ratings for neologisms compared to existing words. Experiment 2 (visual lexical decision) and Experiment 3 (self-paced reading in connected discourse) made use of long-distance priming. In both experiments, the prime (base or neologism) was followed after 39 intervening trials by the neologism. As revealed by mixed-effect analyses of covariance, the target neologisms elicited shorter processing latencies in the identity priming condition compared to the condition in which the base word had been read previously, indicating an incipient facilitatory frequency effect for the neologism.
Despite pleas from methodologists, researchers often continue to dichotomize continuous predictor variables. The primary argument against this practice has been that it underestimates the strength of relationships and reduces statistical power. Although this argument is correct for relationships involving a single predictor, a different problem can arise when multiple predictors are involved. Specifically, dichotomizing 2 continuous independent variables can lead to false statistical significance. As a result, the typical justification for using a median split as long as results continue to be statistically significant is invalid, because such results may in fact be spurious. Thus, researchers who dichotomize multiple continuous predictor variables not only may lose power to detect true predictor–criterion relationships in some situations but also may dramatically increase the probability of Type I errors in other situations. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
It is widely believed that the difference between regular and irregular verbs is re-stricted to form. This study questions this belief. We report a series of lexical statis-tics showing that irregular verbs have a greater density in semantic space. Irregu-lar verbs tend to have greater semantic neighborhoods containing relatively many other irregulars compared to regulars. We show that this greater semantic density for irregulars is reflected in association norms, familiarity ratings, visual lexical de-cision latencies, and word naming latencies. Meta-analyses of the materials of two neuroimaging studies show that in these studies, regularity is confounded with dif-ferences in semantic density. Our results challenge the hypothesis of the supposed formal encapsulation of rules of inflection, and support lines of research in which sensitivity to probability is recognized as intrinsic to human language.
The productivity of English derivational affixes is studied as a function of text type. Principal component analyses show that texts can be classified adequately, not only on the basis of the relative frequencies of the highest frequency words (Burrows, 1992, 1993), but also on the basis of the productivity of derivational affixes. Stylistically heterogeneous texts are clustered into text types, stylistically homogeneous texts cluster in the time dimension, allowing diachronic changes in productivity to be traced. Supplementary analyses on the basis of the relative frequencies of function words support the morphology‐based clusterings. The role and marked nature of the nonnative stratum of the lexicon is discussed in detail, as well as the way in which the rival affixes ‐ness and ‐ity, and un‐ and in‐, are put to use. The results obtained show that any theory of morphological productivity that does not take stylistic factors into account is incomplete.
This paper presents a numeric and information theoretic model for themeasuring of language change, without specifying the particular type ofchange. It is shown that this measurement is intuitively plausibleand that meaningful measurements canbe made from as few as 1000 characters. This measurement techniqueis extended to the task of determining the ``rate'' of language changebased on an examination of brief excerpts from the NationalGeographic Magazine and determining both their linguistic distancefrom one another as well as the number of years of temporal separation.A statistical analysis of these results shows, first, that language changecan be measured, and second, that the rate of languagechange has not been uniform, and that in particular, the period 1939-;1948had particularly slow change, while 1949-;1958 and 1959-;1968 hadparticularly rapid changes.
In this paper, some electronically gathered data arepresented and analyzed about the presence of the pastin newspaper texts. In ten large text corpora of sixdifferent languages, all dates in the form of yearsbetween 1930 and 1990 were counted. For six of thesecorpora this was done for all the years between 1200and 1993. Depicting these frequencies on the timeline,we find an underlying regularly declining curve,deviations at regular places and culturally determinedpeaks at irregular points. These three phenomena areanalyzed.
Mathematically spoken, all the underlying curves havethe same form. Whether a newspaper gives much orlittle attention to the past, the distribution of thisattention over time turns out to be inverselyproportional to the distance between past and present.It is shown that this distribution is largelyindependent of the total number of years in a corpus,the culture in which it is published, the language andthe date of origin of the corpus. The phenomenon isexplained as a kind of forgetting: the larger thedistance between past and present, the more difficultit is to connect something of the past to an item inthe present day. A more detailed analysis of the datashows a breakpoint in the frequency vs. distance fromthe publication date of the texts. References toevents older than approximately 50 years are theresult of a forgetting process that is distinctivelydifferent from the forgetting speed of more recentevents.
Pandel's classification of the dimensions ofhistorical consciousness is used to answer thequestion how these investigations elucidate thehistorical consciousness of the cultures in which thenewspapers are written and read.
A well-known problem in the domain of quantitative linguistics and stylistics concerns the evaluation of the lexical richness of texts. Since the most obvious measure of lexical richness, the vocabulary size (the number of different word types), depends heavily on the text length (measured in word tokens), a variety of alternative measures has been proposed which are claimed to be independent of the text length. This paper has a threefold aim. Firstly, we have investigated to what extent these alternative measures are truly textual constants. We have observed that in practice all measures vary substantially and systematically with the text length. We also show that in theory, only three of these measures are truly constant or nearly constant. Secondly, we have studied the extent to which these measures tap into different aspects of lexical structure. We have found that there are two main families of constants, one measuring lexical richness and one measuring lexical repetition. Thirdly, we have considered to what extent these measures can be used to investigate questions of textual similarity between and within authors. We propose to carry out such comparisons by means of the empirical trajectories of texts in the plane spanned by the dimensions of lexical richness and lexical repetition, and we provide a statistical technique for constructing confidence intervals around the empirical trajectories of texts. Our results suggest that the trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.
A brief introduction to SAS -- Data description and simple inference -- Multiple regression -- Analysus of variance -- Analysis of repeated measures -- Logistic regression -- Analysis of survival times -- Principal components and factor analysis -- Cluster analysis -- Discriminant analysis -- Correspondence analysis
Speeded visual word naming and lexical decision performance are reported for 2428 words for young adults and healthy older adults. Hierarchical regression techniques were used to investigate the unique predictive variance of phonological features in the onsets, lexical variables (e.g., measures of consistency, frequency, familiarity, neighborhood size, and length), and semantic variables (e.g. imageahility and semantic connectivity). The influence of most variables was highly task dependent, with the results shedding light on recent empirical controversies in the available word recognition literature. Semantic-level variables accounted for unique variance in both speeded naming and lexical decision performance, level with the latter task producing the largest semantic-level effects. Discussion focuses on the utility of large-scale regression studies in providing a complementary approach to the standard factorial designs to investigate visual word recognition.
A technique for studying the relationship between brain and language, which involves correlating scores on two continuous variables, signal intensity across the entire brains of brain-damaged patients and behavioral priming scores, was used to investigate a central issue in cognitive neuroscience: Are the components of the neural language system organized as a single undifferentiated process, or do they respond differentially to different types of linguistic structure? Differences in lexical structure, in the form of the regular and irregular past tense, have proven to be critical in this debate by contrasting a highly predictable rule-like process (e.g., jump-jumped) with an unpredictable idiosyncratic process typified by the irregulars (e.g., think-thought). The key issue raised by these contrasts is whether processing regular and irregular past tense forms differentially engages different aspects of the neural language system or whether they are processed within a single system that distinguishes between them purely on the basis of phonological and semantic differences. The correlational analyses provide clear evidence for a functional differentiation between different brain regions associated with the processing of lexical form, meaning, and morphological structure.
• morphology
• neuroscience
The contribution of language history to the study of the early dispersals of modern humans throughout the Old World has been
limited by the shallow time depth (about 8000 ± 2000 years) of current linguistic methods. Here it is shown that the application
of biological cladistic methods, not to vocabulary (as has been previously tried) but to language structure (sound systems
and grammar), may extend the time depths at which language data can be used. The method was tested against well-understood
families of Oceanic Austronesian languages, then applied to the Papuan languages of Island Melanesia, a group of hitherto
unrelatable isolates. Papuan languages show an archipelago-based phylogenetic signal that is consistent with the current geographical
distribution of languages. The most plausible hypothesis to explain this result is the divergence of the Papuan languages
from a common ancestral stock, as part of late Pleistocene dispersals.
The Scientific Method: A Process for LearningThe Role of Statistics in the Scientific Method
Main Approaches to StatisticsPurpose and Organization of This Text
Although Clark's (1973) critique of statistical procedures in language and memory studies (the "language-as-fixed-effect fallacy") has had a profound effect on the way such analyses have been carried out in the past 20 years, it seems that the exact nature of the problem and the proposed solution have not been understood very well. Many investigators seem to assume that generalization to both the subject population and the language as a whole is automatically ensured if separate subject (F1) and item (F2) analyses are performed and that the null hypothesis may safely be rejected if these F values are both significant. Such a procedure is, however, unfounded and not in accordance with the recommendations of Clark (1973). More importantly and contrary to current practice, in many cases there is no need to perform separate subject and item analyses since the traditional F1 is the correct test statistic. In particular this is the case when item variability is experimentally controlled by matching or by counterbalancing.
Current investigators of words, sentences, and other language materials almost never provide statistical evidence that their findings generalize beyond the specific sample of language materials they have chosen. Nevertheless, these same investigators do not hesitate to conclude that their findings are true for language in general. In so doing, it is argued, they are committing the language-as-fixed-effect fallacy, which can lead to serious error. The problem is illustrated for one well-known series of studies in semantic memory. With the appropriate statistics these studies are shown to provide no reliable evidence for most of the main conclusions drawn from them. A review of other experiments in semantic memory shows that many of them are likewise suspect. It is demonstrated how this fallacy can be avoided by doing the right statistics, selecting the appropriate design, and sampling by systematic procedures, or, alternatively, by proceeding according to the so-called method of single cases.
Introduction and Historical PerspectiveTechnical Background
Experimental ExperienceSummary Interpretation, and Examples of Diagnosing Actual Data for CollinearityAppendix 3A: The Condition Number and InvertibilityAppendix 3B: Parameterization and ScalingAppendix 3C: The Weakness of Correlation Measures in Providing Diagnostic InformationAppendix 3D: The Harm Caused by Collinearity
This research exploits the English and Dutch CELEX lexical database to investigate the form similarity relations between words. Lexical statistics analyses replicate and extend the findings of Landauer and Streeter (1973) concerning the relation between a word′s frequency and the density and frequency of its similarity neighborhood. The results for both Dutch and English reveal only a weak tendency for high-frequency written and spoken words to have more neighbors than rare words and for these neighbors to be more frequent than those of rare words. However, the number of neighbors was found to correlate more highly with bigram frequency than with word frequency. To clarify the relations between these properties, a stochastic model is presented which captures the relevant effects of phonotactic structure on neighborhood similarities. The implications of these findings for models of language production and comprehension are considered.
Similarities and differences between speech and writing have been the subject of innumerable studies, but until now there has been no attempt to provide a unified linguistic analysis of the whole range of spoken and written registers in English. In this widely acclaimed empirical study, Douglas Biber uses computational techniques to analyse the linguistic characteristics of twenty three spoken and written genres, enabling identification of the basic, underlying dimensions of variation in English. In Variation Across Speech and Writing, six dimensions of variation are identified through a factor analysis, on the basis of linguistic co-occurence patterns. The resulting model of variation provides for the description of the distinctive linguistic characteristics of any spoken or written text andd emonstrates the ways in which the polarization of speech and writing has been misleading, and thus enables reconciliation of the contradictory conclusions reached in previous research.
In spontaneous speech words are often pronounced in reduced form. Some words are reduced to such an extent that an orthographic transcription would be very different from the orthographic norm. An example from Dutch is the word MOGE-LIJK (’poss-ible’), which can be pronounced not only as MO.GE.LEK but also as MO.GEK, MO.LEK, or even as MOK.
Ever since Gernsbacher (1984), it is widely believed that word frequency counts based on corpora are unreliable, particularly for the highest and lowest frequency words due to regression towards the mean. In this study, however, we show that word frequency counts across corpora are not subject to regression towards the mean, neither in theory nor in practice. Sampling error due to underdispersion, however, remains a serious concern. © 2003 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved.
This paper describes a population model for word frequency distributions based on the Zipf-Mandelbrot law, corresponding to the word frequency distribution induced by a random character sequence. The model, which has convenient analytical and numerical properties, is shown to be adequate for the description of language data extracted by automatic means from large text corpora. It can thus be used to study the problems faced by the statistical analysis of such data in the field of natural-language processing.
Balota et al. [Balota, D., Cortese, M., Sergent-Marshall, S., Spieler, D., & Yap, M. (2004). Visual word recognition for single-syllable words. Journal of Experimental Psychology: General, 133, 283–316] studied lexical processing in word naming and lexical decision using hierarchical multiple regression techniques for a large data set of monosyllabic, morphologically simple words. The present study supplements their work by making use of more flexible regression techniques that are better suited for dealing with collinearity and non-linearity, and by documenting the contributions of several variables that they did not take into account. In particular, we included measures of morphological connectivity, as well as a new frequency count, the frequency of a word in speech rather than in writing. The morphological measures emerged as strong predictors in visual lexical decision, but not in naming, providing evidence for the importance of morphological connectivity even for the recognition of morphologically simple words. Spoken frequency was predictive not only for naming but also for visual lexical decision. In addition, it co-determined subjective frequency estimates and norms for age of acquisition. Finally, we show that frequency predominantly reflects conceptual familiarity rather than familiarity with a word’s form.
Are regular morphologically complex words stored in the mental lexicon? Answers to this question have ranged from full listing to parsing for every regular complex word. We investigated the roles of storage and parsing in the visual domain for the productive Dutch plural suffix -en.Two experiments are reported that show that storage occurs for high-frequency noun plurals. A mathematical formalization of a parallel dual-route race model is presented that accounts for the patterns in the observed reaction time data with essentially one free parameter, the speed of the parsing route. Parsing for noun plurals appears to be a time-costly process, which we attribute to the ambiguity of -en,a suffix that is predominantly used as a verbal ending. A third experiment contrasted nouns and verbs. This experiment revealed no effect of surface frequency for verbs, but again a solid effect for nouns. Together, our results suggest that many noun plurals are stored in order to avoid the time-costly resolution of the subcategorization conflict that arises when the -ensuffix is attached to nouns.
Data from repeated measures experiments are usually analyzed with conventional ANOVA. Three well-known problems with ANOVA are the sphericity assumption, the design effect (sampling hierarchy), and the requirement for complete designs and data sets. This tutorial explains and demonstrates multi-level modeling (MLM) as an alternative analysis tool for repeated measures data. MLM allows us to estimate variance and covariance components explicitly. MLM does not require sphericity, it takes the sampling hierarchy into account, and it is capable of analyzing incomplete data. A fictitious data set is analyzed with MLM and ANOVA, and analysis results are compared. Moreover, existing data from a repeated measures design are re-analyzed with MLM, to demonstrate its advantages. Monte Carlo simulations suggest that MLM yields higher power than ANOVA, in particular under realistic circumstances. Although technically complex, MLM is recommended as a useful tool for analyzing repeated measures data from speech research.
Linguists and speech researchers who use statistical methods often need to estimate the frequency of some type of item in a population containing items of various types. A common approach is to divide the number of cases observed in a sample by the size of the sample; sometimes small positive quantities are added to divisor and dividend in order to avoid zero estimates for types missing from the sample. These approaches are obvious and simple, but they lack principled justification, and yield estimates that can be wildly inaccurate. I.J. Good and Alan Turing developed a family of theoretically well‐founded techniques appropriate to this domain. Some versions of the Good‐Turing approach are very demanding computationally, but we define a version, the Simple Good‐Turing estimator, which is straightforward to use. Tested on a variety of natural‐language‐related data sets, the Simple Good‐Turing estimator performs well, absolutely and relative both to the approaches just discussed and to other, more sophisticated techniques.
This paper addresses the relation between meaning, lexical productivity, and frequency of use. Using density estimation as a visualization tool, we show that differences in semantic structure can be reflected in probability density functions estimated for word frequency distributions. We call attention to an example of a bimodal density, and suggest that bimodality arises when distributions of well-entrenched lexical items, which appear to be lognormal, are mixed with distributions of productively created nonce formations.
How do people know as much as they do with as little information as
they get? The problem takes many forms; learning vocabulary from
text is an especially dramatic and convenient case for research.
A new general theory of acquired similarity and knowledge representation,
latent semantic analysis (LSA), is presented and used to successfully
simulate such learning and several other psycholinguistic phenomena.
By inducing global knowledge indirectly from local co-occurrence
data in a large body of representative text, LSA acquired knowledge
about the full vocabulary of English at a comparable rate to schoolchildren.
LSA uses no prior linguistic or perceptual similarity knowledge;
it is based solely on a general mathematical learning method that
achieves powerful inductive effects by extracting the right number
of dimensions (e.g., 300) to represent objects and contexts. Relations
to other theories, phenomena, and problems are sketched.