Martin Schweinberger

Martin Schweinberger
The University of Queensland | UQ · School of Languages and Cultures

Dr.

About

35
Publications
19,755
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
244
Citations
Introduction
I'm a quantitative corpus linguist specialized in computational analyses of text and speech data working as Lecturer in Applied Linguistics at the University of Queensland and Associate Professor II in the AcqVA-Aurora Center at the Arctic University of Norway in Tromsø. I established and direct the Language Technology and Data Analysis Laboratory (LADAL), I'm an ICAME board member, CI of the Australian Text Analytics Platform, and involved in the Language Data Commons of Australia.
Additional affiliations
October 2018 - present
The University of Queensland
Position
  • Fellow
March 2018 - September 2018
University of Hamburg
Position
  • Principal Investigator
September 2016 - February 2018
University of Hamburg
Position
  • Professor
Education
October 2008 - November 2011
University of Hamburg
Field of study
  • English Linguistics
August 2006 - February 2007
University of Galway
Field of study
  • Enflish Studies, Philosophy, Psychology
October 2000 - February 2008
Universität Kassel
Field of study
  • English Philology

Publications

Publications (35)
Article
Full-text available
The focus of research on Singapore English has traditionally been on its structural features, while the relationship between English and other official languages of the Republic within the individual speaker has attracted much less interest, and comparatively little empirical data exist on the actual linguistic ecology of individual Singaporeans. T...
Article
Full-text available
This paper uses a bibliometric analysis to map the field of Corpus Linguistics (CL) research in arts and humanities over the last 20 years, while tracking changes in the popular CL research topics, outlets, highly-cited authors, and geographical origins of published CL research. Based on a collection of the metadata of 5,829 CL-related articles fro...
Article
This study analyzes the L1-acquisition of discourse like and its pragmatic functions in American English based on the Home-School Study of Language and Literacy Development component of the Child Language Data Exchange System (CHILDES). The data show that discourse like is already present in the speech of 3- and 4-year-old children and that even ve...
Article
Full-text available
This study takes a corpus-based approach to investigating ongoing change in the Australian English adjective amplifier system based on the Australian component of the International Corpus of English (ICE). The paper analyzes changes in amplifiers across apparent time, with special attention being placed on amplifier–adjective–bigram frequencies, to...
Article
Information in turns-at-talk is hierarchically structured with some information made more prominent than other information. A key indicator of information structure is nuclear stress. In this paper we aim to identify factors that correlate with nuclear stress placement. Based on exhaustive manual annotation of ten-word turns extracted from the demo...
Article
Full-text available
Despite a marked shift towards blended learning in EAP due to massification of higher education and the impact of COVID-19, relatively little is known about student engagement with EAP genre education as realised through blended learning. This paper explores international undergraduate/graduate students’ perceptions of using an online text, image a...
Article
Full-text available
Public discourse about the COVID-19 that appears on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text-mining a...
Article
Full-text available
Data-driven learning (DDL), or the use of language corpora for the purposes of language learning and teaching, has seen a marked increase in research interest within ICT-rich WEIRD (Western, Educated, Industrialised, Rich and Democratic) contexts. However, less is known about its adoption in nations such as Indonesia where ICT/CALL training is unde...
Article
This paper analyzes the use of very as an adjective amplifier by native speakers and advanced learners of English with diverse language backgrounds based on the International Corpus of Learner English (ICLE) and the Louvain Corpus of Native English Essays (LOCNESS). The study applies Multifactorial Prediction and Deviation Analysis Using Regression...
Article
One of the possible explanations for the decline in IQ test results over the last few decades is the effect computerised digitisation has on our communicative behaviour. In what I call the digital discourse mode the content of what is said is not interpreted but dealt with as data, which are processed mechanically, resulting in a Yes/No, Right/Wron...
Chapter
This study takes a corpus-based approach to examining co-occurrence patterns of amplifiers and adjectives based on the Irish and the New Zealand components of the International Corpus of English (ICE). The chapter investigates changes in amplifier-adjective-bigram frequencies, to provide insights into the mechanisms underlying lexical replacement....
Article
This study aims to exemplify how language teaching can benefit from learner corpus research (LCR). To this end, this study determines how L1 and L2 English speakers with diverse L1 backgrounds differ with respect to adjective amplification, based on the International Corpus of Learner English (ICLE) and the Louvain Corpus of Native English Essays (...
Article
Full-text available
Less is more? The impact of written corrective feedback on corpus-assisted L2 error resolution. Abstract The past decade has seen a sharp increase in research into L2 learners' direct use of language corpora (typically known as 'data-driven learning', DDL) for error resolution in L2 writing. However, a crucial yet underexplored variable in this pro...
Article
This paper investigates the use of speech-unit final like (SUF like ) in standard Irish English (IrE) and takes a variationist approach based on the Irish component of the International Corpus of English (ICE-IRL). The analysis includes both sociolinguistic factors (age, gender, occupation type, religious affiliation, conversation type, audience si...
Article
Full-text available
This study details a replicable method for annotating emotionality of natural language that can be used in sociopragmatic, corpus-based analyses of discourse. A case study uses a type of sentiment analysis based on the crowd-sourced Word-Emotion Association Lexicon to investigate the social stratification of emotives, i.e. words associated with one...
Presentation
Full-text available
A corpus-based, quantitative analysis of ongoing change in the Australian English amplifier system.
Poster
Full-text available
Das Ziel des VowelChartProject (VCP) besteht in der Erstellung von personalisierten Vokaltrapezen für Studierende, die Englisch lernen. Außerdem soll das VCP auch den Nutzen der Sprachwissenschaft für Laien verdeutlichen. Das VCP wird bereits seit 2016 an der Universität Hamburg gefördert und soll nun auch für Studierende der Universität Kassel geö...
Presentation
Full-text available
This corpus-based study examines the L1-acquisition of amplifiers in pre-adjectival slots in American English based on the Home-School Study of Language and Literacy Development (HSLLD) component of the Child Language Data Exchange System (CHILDES). The aim of this study is to add to research which focuses on when and how variation is acquired by...
Article
This paper investigates the use of eh in New Zealand English (NZE) and takes a quantitative, variationist approach based on the New Zealand component of the International Corpus of English (ICE New Zealand). The analysis includes both sociolinguistic (ethnicity, age, gender, occupation type) and psycholinguistic variables (priming). A mixed-effects...
Poster
Full-text available
Das Ziel des VowelChartProject (VCP) besteht in der Erstellung von personalisier- ten Vokaltrapezen (siehe Grafik rechts) für Studierende, die Englisch lernen. Das Projekt ist zur Zeit noch an der Universität Hamburg angesiedelt, aber soll bald auch für Studierende der Universität Kassel zugänglich gemacht werden. In einer früheren Projektphase wur...
Presentation
This study takes a corpus-based approach to examining amplifying intensifiers across four genres in American English (AmE) using the Corpus of Historical American English (COHA) that comprises data from 1810 to 2000. From a language variation and change perspective intensifiers are particularly interesting as they play a crucial part in how speaker...
Presentation
Full-text available
Among the core functions of language is to communicate one’s emotional state to others. Yet, there exists relatively little corpus-based research on how speakers convey their emotions to others. Furthermore, sociolinguistic research on emotional language which is based on corpus data is even more scare. To address this research gap, the current cor...
Article
This study analyzes the effect of sociolinguistic variables on the frequency of swear words in Irish English based on the private dialogue section of the Irish component of the International Corpus of English (ICE). The results of mixed-effects regression models show that speakers between 19 and 33 are substantially more likely to use swear words c...
Presentation
Full-text available
This study takes a corpus-based approach to examining co-occurrence patterns of amplifying intensifiers and adjectives (cf. 1) based on the New Zealand component of the International Corpus of English (ICE). From a language variation and change perspective intensifiers are particularly interesting as they play a crucial part in the “social and emot...
Chapter
Full-text available
This study compares the use of like in Irish English (IrE) to its use in southeastern British English (SE-BrE). There are significant differences between the use of like in IrE and SE-BrE in terms of overall frequency, social meaning and positioning. This paper argues that the differences in the use of like require a functional explanation on two l...
Thesis
Full-text available
The discourse marker LIKE is one of the most salient features of present‐day English. Despite being deemed archaic, dismissed as meaningless and considered symptomatic of careless speech, this non‐standard feature has received scholarly attention and attracted interest in the public media. In spite of being met with derision, its functional versati...
Preprint
Full-text available
The present study examines the use of the discourse marker LIKE 1 in Northern Ireland. Previous claims concerning preferences for the employment of discourse marker LIKE based on the age and gender, as well as the acquisition of this marker are tested by employing a quantitative approach and statistical evaluation. The study takes a synchronic pers...
Chapter
Full-text available
This study compares the use of like in Irish English (IrE) to its use in southeastern British English (SE-BrE). There are significant differences between the use of like in IrE and SE-BrE in terms of overall frequency, social meaning, and positioning. This paper argues that the differences in the use of like require a functional explanation on two...

Questions

Question (1)
Question
Hi everyone,
I have used tree-based models (in my case Boruta) to streamline the model fitting process of mixed-effects modelling when dealing with many predictors.
In a comment to a recently submitted paper, a reviewer raised objections to this procedure based on Strobel, Malley & Tutz (2009) who state that "Note, however, that variable selection should not be conducted before applying another statistical method on the same learning data (Ambroise & McLachlan, 2002; Boulesteix et al., 2008; Leeb & Po¨tscher, 2006)."
However, immediately before they reject this procedure Strobel, Malley & Tutz (2009) write that "In addition to this, a black box method like random forests can be used to identify a small number of potentially relevant predictors from the full feature list, which can then be processed (e.g., by means of a familiar parametric method). This two-stage approach has been successfully applied in a variety of applications (see, e.g., Ward et al., 2006)."
I do not see this two-satge procedure as problematic but I am open to change my mind and wanted to ask you for your thoughts on this. So, what do you think?
Strobl, Carolin, James Malley, and Gerhard Tutz. 2009. "An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests." Psychological methods 14(4): 323–348.

Network

Cited By