Dawn Knight

Dawn Knight
Cardiff University | CU · School of English, Communication and Philosophy

Professor

About

81
Publications
8,079
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
459
Citations

Publications

Publications (81)
Article
Online communication via video platforms has become a standard component of workplace interaction for many businesses and employees. The rapid uptake in the use of virtual meeting platforms due to COVID-19 restrictions meant that many people had to quickly adjust to communication via this medium without much (if any) training as to how workplace co...
Article
Online communication via video platforms has become a standard component of workplace interaction for many businesses and employees. The rapid uptake in the use of virtual meeting platforms due to COVID-19 restrictions meant that many people had to quickly adjust to communication via this medium without much (if any) training as to how workplace co...
Article
Full-text available
Health messaging is effective if it is to achieve audience adherence to guidance. Through the lens of Systemic Functional Linguistics, we examine the expression of obligation in poster-based health campaigns (4 posters) employed during the COVID-19 pandemic in the UK by considering whether differences in grammatical mood and modality values impact...
Article
Amid COVID-19 and the so-called “digital pivot”, online virtual communication is at the heart of our professional and private lives. As we move into a post-COVID context, the affordances of the digital turn have shown that we can operate professionally online but there is a need for better understanding of communication in the online workplace. Thi...
Article
Full-text available
Understanding what makes communication effective when designing public health messages is of key importance. This applies in particular to vaccination campaigns, which aim to encourage vaccine uptake and respond to vaccine hesitancy and dispel any myth or misinformation. This paper explores the ways in which the governments of Great Britain (Englan...
Technical Report
Full-text available
Communicating health threats: Linguistic evidence for effective public health messaging during the Covid-19 pandemic
Article
Full-text available
Understanding the reception of public health messages in public-facing communications is of key importance to health agencies in managing crises, pandemics, and other health threats. Established public health communications strategies including self-efficacy messaging, fear appeals, and moralising messaging were all used during the Coronavirus pand...
Article
Full-text available
Understanding the reception of public health messages in public-facing communications is of key importance to health agencies in managing crises, pandemics, and other health threats. Established public health communications strategies including self-efficacy messaging, fear appeals, and moralising messaging were all used during the Coronavirus pand...
Preprint
Full-text available
Welsh is an official language in Wales and is spoken by an estimated 884,300 people (29.2% of the population of Wales). Despite this status and estimated increase in speaker numbers since the last (2011) census, Welsh remains a minority language undergoing revitalization and promotion by Welsh Government and relevant stakeholders. As part of the ef...
Article
This study systematically reviews existing approaches to unsupervised grammar induction in terms of their theoretical underpinnings, practical implementations and evaluation. Our motivation is to identify the influence of functional-cognitive schools of grammar on language processing models in computational linguistics. This is an effort to fill an...
Chapter
Using a case study approach, this chapter provides a worked demonstration of how a detailed design frame for a national corpus may be built. Focusing on each individual ‘mode’ of data in turn (i.e. spoken, written and e-language), the chapter explores some key generic questions that are helpful in informing design frame development. By reporting on...
Chapter
Moving on to the final key stages of building a corpus, this chapter provides a brief exploration of approaches to processing and (re)presenting language data for future analysis. The chapter first details the importance of the transcription phase in spoken corpus development and documents how bespoke transcription conventions can be developed for,...
Chapter
This chapter details the final composition of each of CorCenCC’s sub-corpora, reflects on some of the practical issues and challenges faced when compiling these sub-corpora, and assesses the relative similarity (and differences) of the released corpus in comparison to its initial design (as presented in Chapter 2). The chapter effectively presents...
Chapter
The book aims to provide a micro-level, working model of a methodological approach to, and practical guidelines for, building a corpus informed by the work on the CorCenCC project. This first chapter provides the context to the work and lays foundations for the first step of corpus building, that is, the examination of the context of the language....
Chapter
Together with a consideration of the general types of text/data to be included in a corpus (as examined by the development of the design frames in Chapter 2), attention needs to be given to who will contribute data and how they will be recruited. This chapter focuses specifically on the stages of participant recruitment and (meta)data collection wh...
Article
Full-text available
CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes—National Corpus of Contemporary Welsh) is the first comprehensive corpus of Welsh designed to be reflective of language use across communication types, genres, speakers, language varieties (regional and social) and contexts. This article focuses on the computational infrastructure that we have designed t...
Article
Full-text available
Word embeddings are representations of words in a vector space that models semantic relationships between words by means of distance and direction. In this study, we adapted two existing methods, word2vec and fastText, to automatically learn Welsh word embeddings taking into account syntactic and morphological idiosyncrasies of this language. These...
Article
Full-text available
Cross-lingual embeddings are vector space representations where word translations tend to be co-located. These representations enable learning transfer across languages, thus bridging the gap between data-rich languages such as English and others. In this paper, we present and evaluate a suite of cross-lingual embeddings for the English–Welsh langu...
Chapter
Building a corpus entails the principled collection of a dataset, and corpora designed for general purposes commonly require the submission of that data to an annotation process whereby each item is ‘tagged’ according to its part of speech (POS). In some cases, a ready-made tag-set is applied to the data, and in other cases a bespoke tag-set is req...
Chapter
This chapter provides a blueprint for corpus design and construction in minoritised language contexts. This blueprint is not exhaustive, but functions to provide a working scaffold, alongside technical and systematic recommendations from major language contexts for other linguistic communities aspiring to construct their own corpora.
Chapter
In 2017, Welsh Government (WG) published an ambitious vision: to achieve a million Welsh speakers by 2050, almost doubling current numbers. The initial workplan to realise this aim, which includes a section on linguistic infrastructure, undertakes (by 2021) to: ‘support the production of more high-quality lexicographical, corpus and terminology res...
Chapter
Yn 2017, cyhoeddodd Llywodraeth Cymru (LlC) weledigaeth uchelgeisiol: cael miliwn o siaradwyr Cymraeg erbyn 2050, gan ddyblu’r niferoedd cyfredol, bron. Mae’r cynllun gwaith cychwynnol i wireddu’r nod hwn, sy’n cynnwys adran ar seilwaith ieithyddol, yn ymgymryd (erbyn 2021) i roi: ‘cefnogaeth i gynhyrchu rhagor o adnoddau geiriadurol, corpora a the...
Chapter
Mae creu corpws mewn cyd-destun iaith leiafrifol yn cynnig heriau diddorol, ond mae hefyd yn cynnig cyfleoedd nad ydynt ar gael bob amser i ddatblygwyr corpora ar gyfer ieithoedd mwy. Mae’r llyfr hwn yn dangos sut mae archwilio cyd-destun unigryw iaith leiafrifoledig benodol, a chydweithredu ystyrlon â grwpiau defnyddwyr posibl, yn gallu pennu’r gw...
Chapter
Mae’r bennod hon yn rhoi glasbrint ar gyfer cynllunio a chreu corpws yng nghyd-destunau ieithoedd lleiafrifoledig. Nid yw’r glasbrint hwn yn cynnwys popeth, ond mae’n rhoi sgaffald defnyddiol, ynghyd ag argymhellion technegol a systematig o gyd-destunau ieithoedd mawr i gymunedau ieithyddol eraill sy’n dymuno creu eu corpora eu hunain.
Chapter
Corpus creation in a minoritised language context poses interesting challenges, but also presents opportunities that are not always available to developers of corpora for larger languages. This book demonstrates how scrutiny of the unique context of a specific minoritised language, and meaningful collaboration with potential user groups, can determ...
Chapter
Mae creu corpws yn cynnwys casglu set data ar sail egwyddorion, ac fel arfer yn achos corpora a gynlluniwyd at ddibenion cyffredinol, mae’n rhaid cyflwyno’r data hynny gan ddilyn proses anodi lle mae pob eitem yn cael ei ‘thagio’ yn ôl ei rhan ymadrodd (RhY). Mewn rhai achosion, cymhwysir set barod o dagiau i’r data, ac mewn achosion eraill mae’n o...
Chapter
Fel y nodwyd ym Mhennod 2.1, roedd dwy egwyddor arweiniol i weledigaeth CorCenCC: yn gyntaf, dylai’r data iaith a gipiwyd yn y corpws gynrychioli cystal â phosibl y ffyrdd y mae’r Gymraeg yn cael ei defnyddio a’i chanfod ar hyn o bryd. Yn ail, dylai seilwaith y corpws ymdrin ag anghenion y grwpiau amrywiol o ddefnyddwyr cymunedol sy’n gweithredu yn...
Chapter
As noted in Chap. 1.1, CorCenCC’s vision rested on two guiding principles: first, the language data captured within the corpus should be as representative as possible of the ways in which Welsh is currently used and encountered. Second, the corpus infrastructure should address the needs of the multiple community user groups operating in and/or enga...
Article
This bilingual book provides a detailed overview of the project to construct a National Corpus of Contemporary Welsh (CorCenCC), addressing the conceptual and methodological challenges faced when developing language corpora for minoritised languages. A conceptual framework is presented for the user-driven design that underpinned the CorCenCC projec...
Article
This book aims to provide a micro-level, working model of a methodological approach and practical guidelines for building a corpus, informed by the work on the CorCenCC project (Corpws Cenedlaethol Cymraeg Cyfoes - the National Corpus of Contemporary Welsh). It focuses specifically on the development of detailed design frames for corpora across com...
Conference Paper
According to Cognitive Grammar (CG) theory, the overall structure of a natural language is motivated by a relatively small set of domain-independent cognitive abilities. In this paper, we draw insights from CG to propose an approach to natural language parsing with little syntactic annotation. A sentence functions as a cohesive whole because its pa...
Preprint
This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that ar...
Article
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language sam...
Conference Paper
Full-text available
While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models. However, NLP research efforts for low-resource languages have focused on constantly seeking ways to ha...
Conference Paper
Full-text available
This paper investigates an adaptation of an existing system for multi-word term recognition, originally developed for English, for Welsh. We overview the modifications required with a special focus on an important difference between the two representatives of two language families, Germanic and Celtic, which is concerned with the directionality of...
Conference Paper
Full-text available
Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantic taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource languages, such tools are yet to be developed. In this paper, we report on the developme...
Article
Text visualisations provide visual representations of documents or small corpora with the primary aim of supporting language analysis. We are interested in developing a more playful approach to language that can be characterised by the notion of wandering as an open-ended movement. To support such a casual form of engagement with text, we designed...
Article
The focus of this paper is on characterizing patterns of language use in online auction sites (the ‘e-marketplace’), with a specific focus on the language used in product descriptions listed on eBay. Through a corpus-based based analysis of a sub-corpus of a 6.3-million-word corpus of eBay data, the authors aim to gain a better understanding of var...
Chapter
In this chapter, we consider how corpus linguistics (CL) and conversation analysis (CA) can be used together to provide enhanced understandings of spoken interaction in the context of small group teaching in higher education (HE). Following a description of the construction of the 1 million word NUCASE corpus (Newcastle Corpus of Academic Spoken En...
Article
Full-text available
There is currently an explosion in the number and range of new devices coming onto the technology market that use digital sensor technology to track aspects of human behaviour. In this article, we present and exemplify a three-stage model for the application of digital sensor technology in applied linguistics that we have developed, namely, Technol...
Article
Text visualisations provide visual representations of documents or small corpora with the primary aim of supporting language analysis. We are interested in developing a more playful approach to language that can be characterised by the notion of wandering as an open-ended movement. To support such a casual form of engagement with text, we designed...
Chapter
The case study described in this chapter involves the incorporation of ‘nonlinguistic’ data streams in spoken corpus analysis. Here new possibilities are outlined for how we may relate use of language measurements of different aspects of context gathered from multiple sensors (especially, for example, of position, movement and time). Such alternati...
Chapter
Digital communication in the age of ‘web 2.0’ (that is the second generation of the internet which is focused on the growth of social media and driven by user-generated content) is becoming ever-increasingly embedded into our daily lives. Defining, characterising and understanding the ways in which discourse is used to scaffold our existence in thi...
Article
This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC). ³ 3 This corpus has been built as part of a collaborative project between the University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken...
Chapter
This chapter provides a corpus-based analysis of formality in e-language. It examines how levels of formality differ from one ‘mode’ of e-language to the next, and how these collectively compare to spoken and written discourse, providing the foundations for enhancing our descriptions and understanding of e-language use. The chapter focuses on commo...
Chapter
‘Corpus Linguistics: Methods, Theory and Practice’ provides the reader with a good balance of detailed and interesting facts, figures and findings from the history and use of corpus analysis as well as in-depth discussions of the theoretical underpinnings of corpus linguistics. It documents how corpus linguistics perhaps ‘lives’ and ‘breathes’, and...
Article
Heterogeneous corpora are emergent multi-modal datasets which comprise a variety of different records of everyday communication, from SMS/MMS messages to interactions in virtual environments, and from GPS data to phone and video calls. By tracking a person's specific (inter)actions over time and place, the analysis of such “ubiquitous“ corpora enab...
Article
Full-text available
This paper takes stock of the current state-of-the-art in multimodal corpus linguistics, and proposes some projections of future developments in this field. It provides a critical overview of key multimodal corpora that have been constructed over the past decade and presents a wish-list of future technological and methodological advancements that m...
Article
In this paper, we address a number of key methodological challenges and concerns faced by linguists in the development of a new generation of corpora: the multi-modal, multi-media corpus – that which combines video, audio and textual records of naturally occurring discourse. We contextualise these issues according to a research project which is cur...
Article
Current methodologies in corpus linguistics have revolutionised the way we look at language. They allow us to make objective observations about written and spoken language in use. However, most corpora are limited in scope because they are unable to capture language and communication beyond the word. This is problematic given that interaction is in...
Article
Full-text available
This paper explores the ways in which the Digital Replay System (DRS, innovative social science software, constructed as part of the National Centre for e-Social Science's Digital Records for eSocial Science node) has facilitated research from two distinct social science based end-users; learning scientists and corpus linguists. We discuss how DRS...
Article
Full-text available
Digital Replay System (DRS) is a software tool being developed by the DReSS node of the UK ESRC-funded National Centre for e-Social Science. It is publically available under an open source license and is designed to support the organisation, synchronised replay, and analysis of complex multimodal corpora including audio, video, dialogue transcripts...
Article
This paper addresses some of the linguistic and technological procedures and requirements of the next generation of tools for the analysis of spoken linguistic corpora. It reports on preliminary developments of an ESRC funded interdisciplinary project at the University of Nottingham. It specifically focuses on key methodological and technical issue...
Article
This paper reports on the latest developments made as part of the ESRC funded Understanding Digital Records for eSocial Science Project (DReSS) at the University of Nottingham. Specifically, it reports on some of the issues and challenges that are currently being faced in compilation and use of heterogeneous multi-modal corpora comprised of heterog...
Article
Full-text available
The DReSS Research Node seeks to explore and understand how new forms of digital record may emerge from and for eSocial Science. This node complements existing nodes by examining how Grid-based technologies can be extended to provide new processes and services through which social science information may be collected, collated, and distributed. In...
Article
Full-text available
Digital Replay System (DRS) is a software tool being developed by the DReSS node of the UK ESRC-funded National Centre for e-Social Science. It is publically available under an open source license and is designed to support the organisation, synchronised replay, and analysis of complex multimodal corpora including audio, video, dialogue transcripts...
Article
1. Abstract The difficulties associated with the development of spoken corpora large enough to yield stable analytical results have meant that much of corpus linguistics has focused on the analysis of written discourse. However, alongside the large-scale studies of lexico-grammar on the basis of mainly written corpora, there has been a consistent e...
Article
1. OVERVIEW Throughout the development of corpus linguistics there has been a noticeable focus on analysing written language, and with written corpora now exceeding the one billion word mark, the possibilities for generating new insights into the way in which language is structured and used are both exciting and unprecedented. Spoken corpora, on th...

Network

Cited By