Tony McEnery's research while affiliated with Lancaster University and other places

Publications (127)

Book
How might evidence of language use – writing and speech – be used as a way of studying language? Corpus linguistics is the study of linguistic data from a particular language or set of languages. It is a fast-moving approach to studying language, and there is still a degree of divergence in how research questions are approached using corpus data. T...
Article
How might evidence of language use – writing and speech – be used as a way of studying language? Corpus linguistics is the study of linguistic data from a particular language or set of languages. It is a fast-moving approach to studying language, and there is still a degree of divergence in how research questions are approached using corpus data. T...
Article
This paper applies a new approach to the identification of discourses, based on Multiple Correspondence Analysis (MCA), to the study of discourse variation over time. The MCA approach to keywords deals with a major issue with the use of keywords to identify discourses: the allocation of individual keywords to multiple discourses. Yet, as this paper...
Article
Full-text available
This paper combines evidence from the analyses of large sets of newspaper material and long-term rainfall records to gain insights into representations of drought events in the United Kingdom, between 1800 and 2014. More specifically, we bring together two different, though complementary, approaches to trace longitudinal patterns in the ways drough...
Book
How do violent jihadists use language to try to persuade people to carry out violent acts? This book analyses over two million words of texts produced by violent jihadists to identify and examine the linguistic strategies employed. Taking a mixed methods approach, the authors combine quantitative methods from corpus linguistics, which allows the id...
Article
This study examines how patients use narratives to evaluate their experiences of healthcare services online. The analysis draws on corpus linguistic techniques, specifically annotation, applying Labov and Waletzky’s (1967) framework to a sample of online comments about the NHS in England. Narratives are pervasive in this context, being present more...
Article
Full-text available
On the surface, it appears that conversational language is produced in a stream of spoken utterances. In reality conversation is composed of contiguous units that are characterized by coherent communicative purposes. A large number of important research questions about the nature of conversational discourse could be addressed if researchers could i...
Article
Full-text available
The British National Corpus 2014 is a major project led by Lancaster University to create a 100-million-word corpus of present day British English. This corpus has been constructed as a comparable counterpart of the original British National Corpus (referred to as the BNC1994 in this article), which was compiled in the early 1990s. This article sta...
Article
This article introduces a new method for grouping keywords and examines the extent to which it also allows analysts to explore the interaction of discourse and subregister. It uses the multivariate statistical technique, Multiple Correspondence Analysis, to reveal dimensions of keywords which co-occur across the texts of a corpus. These dimensions...
Article
Full-text available
This article explores the language of violent jihad, focussing upon lexis encoding concepts from Islam. Through the use of correlation statistics, this article demonstrates that the words encoding such concepts distribute in dependent relationships across different types of texts. The correlation between the words cannot be simply explained in term...
Chapter
This chapter explores the use of #LancsBox to explore collocations and colligations in the framework of lexicogrammar. The chapter presents lexicogrammar and relates it to collocation and colligation, noting how those concepts are best viewed as part of a continuum rather than as discrete concepts. We then use #LancsBox to look at lexicogrammar fro...
Chapter
This chapter reports on research resulting from academics from linguistics, history and geography working together in order to cast light upon the geography of prostitution in seventeenth-century Britain. We will demonstrate the usefulness and untapped potential of combining corpus linguistics and Geographical Information Systems (GIS) as an approa...
Article
Full-text available
This article introduces a methodology for the diachronic analysis of large historical corpora, Usage Fluctuation Analysis (UFA). UFA looks at the fluctuation of the usage of a word as observed through collocation. It presupposes neither a commitment to a specific semantic theory, nor that the results will focus solely on semantics. We focus, rather...
Article
This paper introduces a new corpus resource for language learning research, the Trinity Lancaster Corpus (TLC), which contains 4.2 million words of interaction between L1 and L2 speakers of English. The corpus includes spoken production from over 2,000 L2 speakers from different linguistic and cultural backgrounds at different levels of proficiency...
Article
Full-text available
The implicit association test (IAT) measures bias towards often controversial topics (e.g., race, religion), while newspapers typically take strong positive/negative stances on such issues. In a pre-registered study, we developed and administered an immigration IAT to readers of the Daily Mail (a typically anti-immigration publication) and the Guar...
Article
Full-text available
In this article we explore the relationship between learner corpus and second language acquisition research. We begin by considering the origins of learner corpus research, noting its roots in smaller scale studies of learner language. This development of learner corpus studies is considered in the broader context of the development of corpus lingu...
Article
Full-text available
This article focuses on how register considerations informed and guided the design of the spoken component of the British National Corpus 2014 (Spoken BNC2014). It discusses why the compilers of the corpus sought to gather recordings from just one broad spoken register – ‘informal conversation’ – and how this and other design decisions afforded con...
Chapter
In this chapter, the authors introduce corpus-assisted discourse studies (CADS), a means of using the methods of corpus linguistics to facilitate discourse analysis of large volumes of textual data. The chapter uses this framework not only to demonstrate the value of CADS but also to explore the importance of repeating studies over time to test the...
Article
Full-text available
This article explores and critically evaluates the potential contribution to discourse studies of topic modelling, a group of machine learning methods which have been used with the aim of automatically discovering thematic information in large collections of texts. We critically evaluate the utility of the thematic grouping of texts into ‘topics’ e...
Preprint
The implicit association test (IAT) measures bias towards often controversial topics (e.g., race, religion), while newspapers typically take strong positive/negative stances on such issues. In a pre-registered study, we developed and administered an immigration IAT to readers of the Daily Mail (a typically anti-immigration publication) and the Guar...
Article
This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012-2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this ne...
Article
Full-text available
In this article we explore public discourse around one marginalized group in early-modern English society, men who engaged in sexual relations with other males. To do this we use a large corpus of seventeenth century texts, the Early English Books Online corpus. Our exploration leads us to consider a number of methodological issues, notably low fre...
Article
Full-text available
Language acquisition occupies a central place in the study of human cognition, and research on how we learn language can be found across many disciplines, from developmental psychology and linguistics to education, philosophy, and neuroscience. It is a very challenging topic to investigate given that the learning target in first and second language...
Article
Full-text available
This article contributes to the debate about the appropriate use of corpus data in language learning research. It focuses on frequencies of linguistic features in language use and their comparison across corpora. The majority of corpus-based second language acquisition studies employ a comparative design in which either one or more second language...
Article
Full-text available
This article focuses on the use of collocations in language learning research (LLR). Collocations, as units of formulaic language, are becoming prominent in our understanding of language learning and use; however, while the number of corpus-based LLR studies of collocations is growing, there is still a need for a deeper understanding of factors tha...
Article
Full-text available
The article discusses epistemic stance in spoken L2 production. Using a subset of the Trinity Lancaster Corpus of spoken L2 production, we analysed the speech of 132 advanced L2 speakers from different L1 and cultural backgrounds taking part in four speaking tasks: one largely monologic presentation task an dt hree interactive tasks. The study focu...
Article
The idea that text in a particular field of discourse is organized into lexical patterns, which can be visualized as networks of words that collocate with each other, was originally proposed by Phillips (1983). This idea has important theoretical implications for our understanding of the relationship between the lexis and the text and (ultimately)...
Article
Full-text available
This article analyses reaction to the ideologically inspired murder of a soldier, Lee Rigby, in central London by two converts to Islam, Michael Adebowale and Michael Adebolajo. The focus of the analysis is upon the contrast between how the event was reacted to by the UK National Press and on social media. To explore this contrast, we undertook a c...
Chapter
In this chapter we examine discourses on the social media site Twitter around people who receive government support (commonly referred to as benefits), in the UK. Between 2008–2009 and 2011–2012, the UK experienced recession, and after coming to power in 2010 the Conservative-led coalition government embarked on a program of fiscal austerity that i...
Conference Paper
Full-text available
Sublanguages are varieties of language that form "subsets" of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublang...
Article
Full-text available
Sublanguages are varieties of language that form "subsets" of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublang...
Article
This second edition of The Oxford Handbook of Computational Linguistics has been substantially revised, updated, and expanded. Alongside updated accounts of the topics covered in the first edition, it includes 17 new chapters on subjects such as deep learning, word representation, semantic role labelling, translation technology, opinion mining and...
Chapter
The rapid development of corpus linguistics since the early 1990s has revolutionized virtually all areas of linguistic research. Research in grammar has probably been influenced most profoundly by the corpus-based approach. It has helped to redefine what a grammar is. Indeed, corpora have had such a strong impact on recently published reference gra...
Article
Full-text available
This article uses methods from corpus linguistics and critical discourse analysis to examine patterns of representation around the word Muslim in a 143 million word corpus of British newspaper articles published between 1998 and 2009. Using the analysis tool Sketch Engine, an analysis of noun collocates of Muslim found that the following categories...
Article
The modern field of corpus linguistics – based around the computer-aided analysis of extremely large databases of text – is largely a phenomenon of the late 1950s onwards. Its early history was marked by opposition from, in particular, Noam Chomsky, who favored a rationalist view over the empiricism associated with corpus-based approaches. However,...
Book
Full-text available
Is the British press prejudiced against Muslims? In what ways can prejudice be explicit or subtle? This book uses a detailed analysis of over 140 million words of newspaper articles on Muslims and Islam, combining corpus linguistics and discourse analysis methods to produce an objective picture of media attitudes. The authors analyse representation...
Article
Full-text available
This paper focuses upon two issues. Firstly, the question of identifying diachronic trends, and more importantly significant outliers, in corpora which permit an investigation of a feature at many sampling points over time. Secondly, we consider how best to combine more qualitatively oriented approaches to corpus data with the type of trends that c...
Book
Corpus linguistics is the study of language data on a large scale – the computer-aided analysis of very extensive collections of transcribed utterances or written texts. This textbook outlines the basic methods of corpus linguistics, explains how the discipline of corpus linguistics developed and surveys the major approaches to the use of corpus da...
Article
Full-text available
En este artículo se discute el grado en que los analistas críticos del discurso pueden utilizar eficazmente los métodos normalmente empleados en la lingüística de corpus. Nuestra investigación se basa en el análisis de un corpus de 140 millones de palabras que se compone de noticias de la prensa británica que tratan sobre refugiados, solicitantes d...
Article
Full-text available
The presentation reports on the outcomes of the ESRC-funded project, Presentation of Islam and Muslims in the UK press, 1998-2009. The project used a corpus-based approach, while also being informed by moral panic theory (Cohen, 1972), and notions central to Critical Discourse Analysis (e.g. Reisigl & Wodak, 2001). The project used a corpus of 143...
Article
This book is concerned with cross-linguistic contrast of major grammatical categories in English and Chinese, two most important yet genetically different world languages. This genetic difference has resulted in many subsidiary differences that are, among other things, related to grammar. Compared with typologically related languages, cross-linguis...
Article
Full-text available
In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The...
Article
Full-text available
This article discusses the extent to which methods normally associated with corpus linguistics can be effectively used by critical discourse analysts. Our research is based on the analysis of a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants and migrants (collectively RASIM). We discuss how processes such...
Conference Paper
Full-text available
Refugees, asylum seekers, and immigrants (henceforth RASIM) coming into the UK have attracted increased press attention (Greenslade, 2005). As their representation in the press can construct their identity (Duffy and Rowden, 2005: 6, in Greenslade, 2005: 7), the discourses surrounding these groups have been the focus of linguistic studies (e.g. ter...
Chapter
The use of statistics in linguistics is increasingly common. This article reviews some basic statistics that are of use to linguists and introduces some more advanced statistics used in general and computational linguistics.
Article
Telicity is an important concept in the study of aspect.While the compatibility tests with completive and durative adverbials have long been in operation as a diagnostic for telicity,their validity and reliability have rarely been questioned.This article critically explores the validity and reliability of such tests and discusses such temporal expr...
Article
For decades, passives as a major grammatical category in both English and Chinese have been subject to much research, both corpus-based and non-corpus-based. A number of contrastive studies of passives in English and Chinese have been published, but they have not used corpus data, being based, rather, on a handful of examples which are common to ne...
Article
This paper explores the collocational behaviour and semantic prosody of near synonyms from a cross-linguistic perspective. The importance of these concepts to language learning is well recognized. Yet while collocation and semantic prosody have recently attracted much interest from researchers studying the English language, there has been little wo...
Chapter
Full-text available
IntroductionThe Nature of Corpus LinguisticsDebates in Corpus LinguisticsLexicogrammar and Lexical GrammarCorpus StudiesReference WorksLanguage TeachingLanguage ChangeConclusion
Article
In this paper I use a corpus of the writings of the Society for the Reformation of Manners to look at the discursive construction of attitudes to bad language in English. Using this corpus of texts as an example of a moral panic about language I use keywords to explore moral panic rhetoric, the formation of spirals of signification and the impact o...
Article
A corpus-based analysis of discourses of refugees and asylum seekers was carried out on data taken from a range of British newspapers and texts from the Office of the United Nations High Commissioner for Refugees website, both published in 2003. Concordances of the terms refugee(s) and asylum seeker(s) were examined and grouped along patterns which...
Article
Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and statistically driven algorithms have been proposed, efficient MWE extraction still remains an unsolved issue. In this paper, we evaluate the Lancaster UCREL S...
Conference Paper
Full-text available
Semantic lexical resources play an important part in both corpus linguistics and NLP. Over the past 14 years, a large semantic lexical resource has been built at Lancaster University. Different from other major semantic lexicons in existence, such as WordNet, EuroWordNet and HowNet, etc., in which lexemes are clustered and linked via the relationsh...
Chapter
Full-text available
This paper reports on the compilation, and ongoing mark up and annotation,of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster University. The main focus of the paper is a preliminary investigation comparing the use of epistemic modality bynative and advanced non-native speakers of Eng...
Article
Do men use bad language more than women? How do social class and the use of bad language interact? Do young speakers use bad language more frequently than older speakers? Using the spoken section of the British National Corpus, Swearing in English explores questions such as these and considers at length the historical origins of modern attitudes to...
Article
This paper describes the work carried out on the EMILLE Project (Enabling Minority Language Engineering), which was undertaken by the Universities of Lancaster and Sheffield. The primary resource developed by the project is the EMILLE Corpus, which consists of a series of monolingual corpora for fourteen South Asian languages, totalling more than 9...
Article
This paper first discusses standards for developing Asian language corpora so as to facilitate international data exchange. Following this, we present two corpora of Asian languages developed at Lancaster University – the EMILLE Corpus, which contains 14 South Asian languages, and the Lancaster Corpus of Mandarin Chi-nese. Finally, we will demonstr...
Conference Paper
Full-text available
Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years, which provides a knowledge base for the USAS semantic tagger. Capturing semantic lexicological theory and empirical lexical usage information extracted fr...
Article
Full-text available
In this paper we describe the Lancaster Speech, Writing and Thought Presenta- tion (SW&TP2) Spoken Corpus. We have constructed this corpus to investigate the ways in which speakers present speech, thought and writing in contemporary spoken British English, with the associated aim of comparing our findings with the patterns revealed by the previous...
Conference Paper
Full-text available
The UCREL semantic analysis system (USAS) is a software tool for undertaking the automatic semantic analysis of English spoken and written data. This paper describes the software system, and the hierarchical semantic tag set containing 21 major discourse fields and 232 fine-grained semantic field tags. We discuss the manually constructed lexical re...
Conference Paper
Full-text available
Annotation schemes for semantic field analysis use abstract concepts to classify words and phrases in a given text. The use of such schemes within lexicography is increasing. Indeed, our own UCREL semantic annotation system (USAS) is to form part of a web-based 'intelligent' dictionary (Herpiö 2002). As USAS was originally designed to enable automa...
Article
Full-text available
As reported by Wilson and Rayson (1993) and Rayson and Wilson (1996), the UCREL semantic analysis system (USAS) has been designed to undertake the automatic semantic analysis of present-day English (henceforth PresDE) texts. In this paper, we report on the feasibility of (re)training the USAS system to cope with English from earlier periods, specif...
Conference Paper
Full-text available
Text reuse is commonplace in academia and the media. An efficient algorithm for automatically detecting and measuring similar/related texts would have applications in corpus linguistics, historical studies and natural language engineering. In an effort to explore the issue of text reuse, a tool, named Crouch 1 , has been developed based on the TESA...
Conference Paper
Full-text available
Semantic annotation is an important and challenging issue in corpus linguistics and language engineering. While such a tool is available for English in Lancaster (Wilson and Rayson 1993), few such tools have been reported for other languages. In a joint Benedict project funded by the European Community under the `Information Society Technologies Pr...
Article
The EMILLE Project (Enabling Minority Language Engineering) was established to construct a 67 million word corpus of South Asian languages. In addition, the project has had to address a number of issues related to establishing a language engineering (LE) environment for South Asian language processing, such as translating 8-bit language data into U...