Douglas Biber’s research while affiliated with Northern Arizona University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (243)


Figure 1. Variation across registers for 1C features: attributive adjectives and prepositional phrases as noun modifiers (per 100,000 words) [based on GSWE
Figure 4. Heatmap of text-level correlations The results shown in Figure 4 provide the details (i.e., the pairwise correlations) for the statistical findings regarding Cells 1C and 3A in BLH (2024a). Thus, each of the features in Cell 1C (Adj+N, N+N, N+OF, N+Prep) have strong positive correlations with the all the other features in this cell (ranging from +0.54 to +0.80). Similarly, each of the features in Cell 3A (FiniteAdvlCls, Verb+THAT, Verb+WH) have strong or moderately strong positive correlations with all the other features in that cell (ranging from +0.47 to +0.38). And finally, each of the features in Cell 1C have strong or moderately strong inverse (negative) correlations with the each of the features in Cell 3A (ranging from -0.65 to -0.35). Beyond that pattern, there are relatively few pairwise comparisons in Figure 4 that show even moderately strong correlations between complexity features. In addition, it is noteworthy that those other correlations do not indicate the existence of additional groupings beyond 1C and 3A. Rather, to the extent that there are other moderately strong correlations, they occur between an additional feature and either the 1C grouping or the 3A grouping (i.e., rather than forming the nucleus for a third major grouping). Three features are especially noteworthy in this regard: EDRel, Prep+ING, and AdvAdvl. EDRel and Prep+ING both have strong or moderately strong positive correlations with all the features in Cell 1C (ranging from +0.43 to +0.55), but no strong correlations with any other complexity feature. These features also have inverse correlations with all 3A features (ranging from -0.16 to -0.31). AdvAdvl has strong or moderately strong positive correlations with all the features in Cell 3A (ranging from +0.44 to +0.60), but no strong correlations with any other complexity feature. And this feature has very strong negative correlations with all the features in Cell 1C (ranging from -0.85 to -0.94). Thus, the noteworthy finding here is that the 1C vs 3A opposition (documented in BLH 2024a) actually includes a few additional features that differ in either their structural or syntactic characteristics. We return to this finding in our discussion below.
Figure 5. Heatmap of register-level correlations Similar to Figure 4, the strongest correlations shown on Figure 5 are among the 1C features and among the 3A features. All 1C features have correlations > 0.85 with one another, while all 3A features have correlations > 0.65 with one another. The inverse correlations between 3A features and 1C features are also quite remarkable, ranging from -0.64 to -0.91. Although our focus here is on the strength of these correlations, most of them are also statistically significant. (The critical value at p <.05 for a one-tailed test (i.e., hypothesizing a positive correlation) with N=8 is r = .549.) Thus, the existence and strength of a higher-order bipolar dimension opposing stereotypical phrasal versus stereotypical clausal complexity features is strongly confirmed in Figure 5. There are many additional cells in Figure 5 that show strong correlations. But surprisingly, similar to Figure 4, these other correlations do not indicate the existence of additional groupings beyond 1C and 3A. Rather, almost all of these other correlations occur between an additional feature and either the 1C grouping or the 3A grouping. To better interpret the composition of these groupings, Figure 6 presents a heatmap based on the same bivariate correlations as in Figure 5, but reorganized to group together the features that have positive register-level correlations with 1C features (the upper-left triangle in the figure, shown in blue) versus the features that have positive register-level correlations with 3A features (the lower-right triangle in the figure, also shown in blue). The inverse correlations between these two groupings of features are shown in red in the lower-left rectangle of the figure. Figure 6 shows that eight additional features (EDRel, INGRel, Noun+TO, Prep+ING, Adj+Prep, TOAdvl, INGAdvl, PrepAdvl) have strong correlations with all 1C features (as well as generally strong correlations with one another), coupled with
Figure 7. Heatmap of Cohen's d values for the rate of occurrence of each complexity feature in each register
Phrasal/clausal complexity features in the grammatical system of English, organized according to major structural type and syntactic function
Accounting for the entire system of complexity features: Evidence for general oral versus literate grammatical complexity dimensions
  • Article
  • Full-text available

May 2025

·

134 Reads

Corpus Linguistics and Lingustic Theory

Douglas Biber

·

Two recent studies (Biber, Larsson, & Hancock 2024a,b) evaluate how well theory-based models account for the distributions of complexity features across spoken and written texts. Those studies provide strong evidence for two groupings of complexity features: phrases functioning syntactically as noun modifiers, and finite dependent clauses functioning as clause-level constituents. At the same time, those studies fail to identify systematic patterns of covariation for the other complexity features (e.g., phrases functioning as clause-level constituents). The present study picks up where those previous studies left off, exploring the possibility that complexity features pattern together in systematic ways at the register level, even though they have less strong patterns of covariation across individual texts. The results show that: 1) all 25 features can be grouped into one of two groupings, referred to as the 'oral' and 'literate' complexity dimensions; and 2) those two dimensions have a strong complementary relation to one another. These general patterns are described and interpreted relative to the particular features grouped into each dimension, the register distributions associated with each dimension, and the extent to which these register-level patterns are found at the text level.

Download

Achieving stability in corpus-based analysis of word types

May 2025

·

36 Reads

International Journal of Corpus Linguistics

Rank-ordered lists of word types are ubiquitous in corpus linguistics and applied linguistics. Word lists are commonly developed as aids for language teaching and learning, vocabulary testing, and language description. Yet, these lists are often produced and used without evaluation of their stability — or replicability — across corpus samples. Our primary objective in this paper is to describe the cumulative state of knowledge regarding the stability of corpus-based word type lists, focusing on three goals that motivate the creation and use of rank-ordered lists: identifying key lexical items for learning or teaching, assessing vocabulary size or knowledge, and identifying all items in a language domain. We show that word type lists are far less stable than researchers and practitioners often assume, although there is substantial variability in stability depending on the goals and methods behind list creation.


And tonight, shocking revelations about TV news broadcasts: A hybrid spoken/written register, or a unique register?

February 2025

·

8 Reads

Register Studies

TV news broadcasts (TVNBs) represent an unusual mix of oral/literate situational characteristics. In addition, TVNBs have shifted historically to become increasingly focused on ‘soft’ human-interest stories. Thus, we have good reason to predict that present-day TVNBs will incorporate a hybrid blend of oral and literate lexico-grammatical characteristics. The present study explores this possibility through a corpus-based analysis of TVNBs compared to conversation and newspaper writing. In part, the results confirm these prior expectations. Surprisingly, though, a more detailed analysis shows that this mixture of situational characteristics has given rise to a unique grammatical style of discourse rather than a hybrid style. In particular, TVNBs can be analyzed to a large extent as main clauses where the finite verb has been deleted. This discourse style is interesting because it cannot be attributed to challenging production circumstances. Rather, the functional motivation seems related to creating a perception of urgency and excitement.


Encouraging cumulative knowledge building as normal practice in (learner) corpus research

November 2024

·

592 Reads

International Journal of Learner Corpus Research

It is not always clear how a cumulative knowledge mindset can permeate the normal primary research studies that we do. That is, both secondary research and replication studies are focused on the goal of establishing what we know; specifying the state of the art for our current state of knowledge. But what about new primary studies, designed with the intent of moving the field forward? Is it possible to also base those on a foundation of cumulative knowledge? We imagine that most readers will react to this question by saying ‘yes, of course – this is what I always do!’. However, there is a crucial distinction that we hope to explore: the difference between motivating a new primary study by identifying a gap that has never been studied before, versus motivating a new primary study by first documenting what we currently know about a topic, and then using that to identify the mysteries that need to be resolved. Both approaches are based on a thorough survey of previous research. But the first focuses on what was done in previous studies, while the second focuses on what we have learned from previous studies. Our broader agenda in this special issue is to encourage the adoption of this second perspective – the cumulative knowledge approach – as normal practice for primary studies. Thus, we explore this possibility through a series of case studies. In this introductory article, we lay the groundwork for those case studies.


Figure 1. Repeated exploratory vs. cumulative research designs
Figure 3. Mean per-text frequencies of each feature along with the mode mean for the written registers
Figure 4. Mean per-text frequencies of each feature along with the mode mean for the spoken registers
Pairings of corpora for situational characteristics
Overview of corpora used
On the role of cumulative knowledge building and specific hypotheses: The case of grammatical complexity

November 2024

·

896 Reads

·

10 Citations

Corpora

Larsson, T., Biber, D., Hancock, G. R. (2024). On the role of cumulative knowledge building and specific hypotheses: The case of grammatical complexity. Corpora, 19(3). Abstract As corpus linguistics matures as a field, there are an increasing number of research areas in which we have accrued sufficient knowledge that we can start building knowledge in a cumulative manner by (a) synthesizing findings and generalizations made by previous research and interpreting new findings in relations to those, and (b) formulating and testing increasingly specific predictions/hypotheses resulting from (a). The present paper outlines what a move toward cumulative knowledge building may look like for the field and offers a case study on grammatical complexity as illustration. In building knowledge in a more systematic way, we can engage more deeply with the claimed generalizable findings from previous research and help move the field's state-of-the-art forward.


Grammatical Analysis Is Required to Describe Grammatical (and “Syntactic”) Complexity: A Commentary on “Complexity and Difficulty in Second Language Acquisition: A Theoretical and Methodological Overview”

October 2024

·

381 Reads

·

4 Citations

Our response is focused on the analysis of grammatical complexity in learner language. We fully support the distinction between complexity and difficulty advocated by Bulté et al., and we agree with their characterization of 'complexity' (as opposed to 'difficulty') as referring 'to the structural characteristics of linguistic items/structures and texts.' We are surprised, though, by Bulté et al.'s recommendation that this construct of grammatical complexity can be adequately captured by omnibus measures that disregard and confound the different influences of multiple structural and syntactic factors. That is, even though Bulté et al. recommend using multiple omnibus measures that are intended to capture different 'subdimensions' of complexity, all of those measures are extremely general, disregarding analysis of particular grammatical structures and completely disregarding analysis of syntactic function. Thus, the methods recommended by Bulté et al. (and those generally practiced by SLA researchers) are based on the implicit assumption that grammatical complexity can be described without actually carrying out a careful grammatical/syntactic analysis. As opposed to that approach, we argue that any adequate description of grammatical complexity must be based on a principled linguistic analysis of grammatical structures and syntactic functions. To illustrate how anomalous the 'omnibus' approach is, imagine a team of biologists who want to describe and compare the complexity of forests. These researchers are aware of the incredible diversity in the composition of forests. For example, one forest is composed of deciduous and coniferous trees from many different species, at different stages of maturity, growing with different extents of density, with undergrowth representing many different plant species; another forest is composed entirely of pine trees all at a single stage of maturity with no undergrowth. However, the researchers decide to disregard what they know about the biology of trees and plants, and instead 1 It is surprising to us that researchers in SLA strongly prefer the label "syntactic" complexity' as opposed to "grammatical" complexity-even though almost all SLA studies of complexity completely disregard analysis of syntactic function, focusing instead solely on structural characteristics. We explain this distinction further in our response below.


Building LANA-CASE, a spoken corpus of American English conversation: Challenges and innovations in corpus compilation

October 2024

·

65 Reads

·

1 Citation

Research in Corpus Linguistics

Elizabeth Hanks

·

Tony McEnery

·

·

[...]

·

The Lancaster-Northern Arizona Corpus of Spoken American English (LANA-CASE) is a collaborative project between Lancaster University and Northern Arizona University to create a publicly available, large-scale corpus of American English conversation. In this article, we describe the design of LANA-CASE in terms of the challenges that have arisen and how these have been addressed – including decisions related to operationalizing the domain, sampling the data, recruiting participants, and selecting instruments for data collection. In addressing these challenges, we were able to draw on and further develop strategies established in the creation of other spoken corpora (including the British English counterpart to LANA-CASE, the Spoken British National Corpus 2014) as well as to implement recent theoretical and technical innovations related to each step. We hope that this discussion can inform future projects focused on the design and construction of spoken corpora.


Comparing theory-based models of grammatical complexity in student writing

July 2024

·

783 Reads

·

4 Citations

International Journal of Learner Corpus Research

The present study tests the empirical adequacy of competing models of grammatical complexity in university student writing, based on analysis of disciplinary texts from L1-English and L2-English students. The results show that grammatical complexity in student writing must be treated as a multi-dimensional linguistic construct, distinguishing among both structural types and syntactic functions. We compare the results here to previous research (Biber et al., 2024a, b), showing a similar patterning of complexity features in student writing and the broader domain of general writing. Two of these groupings – dependent phrases functioning as noun modifiers, and finite dependent clauses functioning as clause-level constituents – are especially interesting. These two groupings represent the strongest co-occurrence patterns in general writing, but only the dependent clause grouping is represented in student writing. This discrepancy is interpreted relative to the development of advanced proficiency in the use of complexity features by university students.


Register and the dual nature of functional correspondence: accounting for text-linguistic variation between registers, within registers, and without registers

May 2024

·

94 Reads

·

4 Citations

Corpus Linguistics and Lingustic Theory

During the past 20 years, corpus linguistic research on register variation has yielded important theoretical advances. The first part of this paper discusses these advances and the cumulative body of research that has produced them. In the second part of the paper, we focus on the goals of research on register variation. The traditional goal of the text-linguistic (TxtLx) approach to linguistic variation has been to describe registers and patterns of register variation: describing the linguistic and situational characteristics of registers. In this paper, we explore a related, but distinct, text-linguistic goal: to account for all linguistic variation among texts. Because the TxtLx framework assumes the importance of functional correspondence between linguistic characteristics and situational characteristics, it is reasonable to assume that in addition to register, we can use situational parameters coded continuously at the level of individual texts as additional predictors of text-linguistic variation. We describe the results of an empirical study to show that using both register categories and text-level situational parameters as predictors results in a more comprehensive and explanatory model of text-linguistic variation. In the conclusion we discuss the future of corpus-based register studies, focusing on unanswered questions related to theoretical claims about register.



Citations (64)


... Their article has provoked several "open peer commentary" responses in the same journal. The response by Biber et al. (2024) points out that the "omnibus" measures suggested in the original article disregard the syntactic functions of grammatical structures. They liken this to a biologist taking a reductionist method and operationalizing the complexity of forests by simply calculating the average height of trees and the mean number of branches per tree. ...

Reference:

Construction Complexity Calculator: A tool for calculating Nelson's (2024) construction-based complexity measure
Grammatical Analysis Is Required to Describe Grammatical (and “Syntactic”) Complexity: A Commentary on “Complexity and Difficulty in Second Language Acquisition: A Theoretical and Methodological Overview”

... While predominantly concerned with tenor, it also acknowledges the influence of field and mode and how situational contexts may shape evaluative language choices. The present study extends the framework to register, recognising that appraisal markers encode communicative goals and wider discourse contexts (Egbert et al., 2024). Unlike studies on grammatical stance (Biber, 1988;Larsson et al., 2024), which tend to focus on (lexico)grammatical features, this paper examines the variability of evaluative meaning, thus providing a complementary perspective. ...

Register and the dual nature of functional correspondence: accounting for text-linguistic variation between registers, within registers, and without registers
  • Citing Article
  • May 2024

Corpus Linguistics and Lingustic Theory

... To illustrate, we will use data from Biber et al. (2025) and compare mean frequencies. 5 The research question is whether L1 and L2 English students' frequency of use of the complexity feature attributive adjective is similar enough to merge the two groups into a single 'student group' that could subsequently be compared to published writers. ...

Comparing theory-based models of grammatical complexity in student writing

International Journal of Learner Corpus Research

... Wie alle Kategorisierungen sind auch diese Passungen nicht naturgegeben, sondern unterliegen komplexen Entscheidungs-und Erkenntnisprozessen (s. a. Larsson und Biber 2024). ...

On the Perils of Linguistically Opaque Measures and Methods

... The results show that grammatical complexity in student writing must be treated as a multi-dimensional linguistic construct, distinguishing among both structural types and syntactic functions. We compare the results here to previous research (Biber et al., 2024a(Biber et al., , 2024b, showing a similar patterning of complexity features in student writing and the broader domain of general writing. Two of these groupingsdependent phrases functioning as noun modifiers, and finite dependent clauses functioning as clause-level constituents -are especially interesting. ...

Dimensions of Text Complexity in the Spoken and Written Modes: A Comparison of Theory-Based Models

Journal of English Linguistics

... That this context is, however, included in the authors' study is apparent from the examples in (6) to (10). Biber et al. 2023: 125) (10) an enigmatic pattern of egg size variation (example (54) in Biber et al. 2023: 126) The authors do acknowledge that 'the of-genitive is categorically required when the head noun is indefinite (i.e. some members of his cabinet)' (Biber et al. 2023: 97). ...

Expanding the scope of grammatical variation: towards a comprehensive account of genitive variation across registers

English Language and Linguistics

... Proponents of single structural complexity measurement rely on the registerfunctional approach in which specific linguistic patterns are associated with specific developmental stages Lan, Lucas, and Sun 2019;Biber, Larsson, and Hancock 2023;Larsson et al. 2023). By contrasting the use of clause structures versus phrase structures, these authors show that the development of L2 academic writing could be measured by singling out specific structures. ...

Exploring early L2 writing development through the lens of grammatical complexity

Applied Corpus Linguistics

... Such an assumption conflicts with the conception of register as 'the patterns of instantiation of the overall (language) system associated with a given type of context' (Halliday and Matthiessen 2004: 29), which Diaz-Legaspe cites as a source of her understanding of register (Diaz-Legaspe 2020: fn. 21) and is in tension with standard conceptions of register, see, e.g., (Biber and Finegan 1994). If a register is defined as the pattern of language associated with a context of use or situation type, then a word's belonging to a register associated with a particular situation type depends on whether the word tends to be used in situations of the particular type. ...

Sociolinguistic Perspectives On Register
  • Citing Book
  • October 2023

... This has certainly been the case over the last decade such that the two approaches now exist in a synergistic relationship, resulting in a stronger methodological framework, as evidenced by the number of publications in the area. Witness the number of edited volumes (Biber et al., 2007;Baker and McEnery, 2015;Taylor and Marchi, 2018 among others), various handbook chapters, all with some overlapping but different foci (Partington and Marchi, 2015;Subtirelu and Baker, 2018;Gray and Biber, 2021;Mautner, 2022) and a full-length handbook devoted to the field (Friginal and Hardy, 2021). The Journal of Corpora and Discourse Studies, established in 2017, publishes studies that use corpus methods to research discourse phenomena. ...

Corpus-based discourse analysis
  • Citing Chapter
  • January 2021