Barbara Mcgillivray

Barbara Mcgillivray
  • PhD
  • Lecturer in digital humanities and cultural computation at King's College London

About

117
Publications
26,256
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,547
Citations
Introduction
My background covers Mathematics (Algebraic geometry in particular) and Classics, via Computational Linguistics (specifically, Latin Computational Linguistics). I am a research fellow at The Alan Turing Institute and the University of Cambridge.
Current institution
King's College London
Current position
  • Lecturer in digital humanities and cultural computation
Additional affiliations
October 2019 - present
Journal of Open Humanities Data
Position
  • Editor
August 2009 - May 2010
University of Bergen
Position
  • Research project
September 2008 - March 2009
Università Cattolica del Sacro Cuore
Position
  • Research project
Education
January 2007 - December 2010
University of Pisa
Field of study
  • Computational Linguistics
October 2004 - May 2006
University of Florence
Field of study
  • Classics
September 1999 - April 2004
University of Florence
Field of study
  • Mathematics

Publications

Publications (117)
Article
Full-text available
This article proposes a methodology for combining natural language processing techniques for diachronic analysis and linguistic linked open data models to detect and represent semantic change. The change in meaning over time of words, phrases, or concepts encompasses complex phenomena that cannot be fully explained by distributional methods alone....
Article
Full-text available
The wider availability of large-scale datasets and reproducible algorithms has boosted the application of NLP to living languages. On the other hand, dead languages benefit from the availability of curated resources both to offset the sparseness of available data and to make data accessible to researchers. We present here AGVaLex, a computational v...
Article
Full-text available
Partaking in the editorial process of an academic journal is both a challenging and rewarding experience. It takes a village of dedicated individuals with a vested interest in the dissemination and sharing of high-quality research outputs. As members of the editorial team of an open access data journal, we reflect on the emergence of data-driven op...
Article
Full-text available
Computational methods have produced meaningful and usable results to study word semantics, including semantic change. These methods, belonging to the field of Natural Language Processing, have recently been applied to ancient languages; in particular, language modelling has been applied to Ancient Greek, the language on which we focus. In this cont...
Conference Paper
Full-text available
Word Sense Disambiguation (WSD) is an important task in NLP, which serves the purpose of automatically disambiguating a polysemous word with its most likely sense in context. Recent studies have advanced the state of the art in this task, but most of the work has been carried out on contemporary English or other modern languages, leaving challenges...
Article
Full-text available
We present the ‘Language of Mechanisation’ datasets with examples of re-use in visualisations and analysis. These reusable CSV files, published on the British Library’s Research Repository, contain automatically-transcribed text from 19th century British newspaper articles. Volunteers on the Zooniverse crowdsourcing platform took part in tasks that...
Article
Full-text available
COVID-19 has triggered innovations in science and society globally, leading to the emergence or establishment of formal neologisms such as infodemic and working from home ( WFH ). While previous work on COVID-related lexical innovation has focused on such formal neologisms, this paper uses data from Reddit to study semantic neologisms like lockdown...
Article
Full-text available
Journal editors have a large amount of power to advance open science in their respective fields by incentivising and mandating open policies and practices at their journals. The Data PASS Journal Editors Discussion Interface (JEDI, an online community for social science journal editors: www.dpjedi.org ) has collated several resources on embedding o...
Conference Paper
Full-text available
We evaluate four count-based and predict-ive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a larger-scale intrinsic evaluation of...
Article
Full-text available
This article examines a long-standing question in the history of technology concerning the trope of the living machine. The authors do this by using a cutting-edge computational method, which they apply to large collections of digitized texts. In particular, they demonstrate the affordances of a neural language model for historical research. In a d...
Preprint
Journal editors have a large amount of power to advance open science in their respective fields by incentivizing and mandating open policies and practices at their journals. The Data PASS Journal Editors Discussion Interface (JEDI, an online community for social science journal editors: www.dpjedi.org) has collated several resources on embedding op...
Article
Full-text available
The open research movement and initiatives like the FAIR principles have been critical in establishing the importance of data in research, particularly within the sciences. Alongside the sciences, attention to openly available data in Humanities and Social Sciences (HSS) research has gradually grown. This growth is largely attributed to the increas...
Preprint
Journal editors have a large amount of power to advance open science in their respective fields by incentivising and mandating open policies and practices at their journals. The Data PASS Journal Editors Discussion Interface (JEDI, an online community for social science journal editors: www.dpjedi.org) has collated several resources on open science...
Article
Full-text available
Historical linguistics is the study of language change and stability, of the history of individual languages, and of the relatedness between languages. In spite of numerous acknowledgements, the adoption of quantitative methods in historical linguistics is still far from being mainstream and it falls below the level of other branches of linguistics...
Conference Paper
Full-text available
The industrialization process associated with the so-called Industrial Revolution in 19th-century Great Britain was a time of profound changes, including in the English lexicon. An important yet understudied phenomenon is the semantic shift in the lexicon of mechanisation. In this paper we present the first large-scale analysis of terms related to...
Article
Full-text available
The humanities and social sciences (HSS) have recently witnessed an exponential growth in data-driven research. In response, attention has been afforded to datasets and accompanying data papers as outputs of the research and dissemination ecosystem. In 2015, two data journals dedicated to HSS disciplines appeared in this landscape: Journal of Open...
Preprint
Full-text available
Distributional semantics, the quantitative study of meaning variation and change through corpus collocations, is currently one of the most productive research areas in computational linguistics. The wider availability of big data and of reproducible algorithms for analysis has boosted its application to living languages in recent years. But can we...
Article
Full-text available
Multi-disciplinary and inter-disciplinary collaboration can be an appropriate response to tackling the increasingly complex problems faced by today’s society. Scientific disciplines are not rigidly defined entities and their profiles change over time. No previous study has investigated multiple disciplinarity (i.e. the complex interaction between d...
Article
Full-text available
We present a new corpus-based resource and methodology for the annotation of Latin lexical semantics, consisting of 2,399 annotated passages of 40 lemmas from the Latin diachronic corpus LatinISE. We also describe how the annotation was designed, analyse annotators’ styles, and present the preliminary results of a study on the lexical semantics and...
Conference Paper
Full-text available
Abstract for long paper, DH Benelux 2022: RE-MIX. Creation and alteration in DH (Hybrid), 1-3 June 2022. Research in computational linguistics has made successful attempts at modelling word meaning at scale, but much remains to be done to put these computational models to the test of historical scholarship (see e.g. Beelen et al. 2021). More impor...
Article
Full-text available
This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST A...
Book
Full-text available
This handbook aims to support higher education institutions with the integration of FAIR-related content in their curricula and teaching. It was written and edited by a group of about 40 collaborators in a series of six book sprint events that took place between 1 and 10 June 2021. The document provides practical material, such as competence profil...
Article
Full-text available
Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, g...
Chapter
Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word with its correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Gree...
Conference Paper
The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD...
Conference Paper
Full-text available
As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TS...
Preprint
Over time, new words enter the language, others become obsolete, and existing words acquire new meanings. The recent digitization efforts have now made it possible to access and mine digital collections of historical texts using automatic methods and investigate the question of semantic change over centuries. Easy access to very large born-digital...
Preprint
Full-text available
Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, g...
Preprint
Full-text available
The semantics of emoji has, to date, been considered from a static perspective. We offer the first longitudinal study of how emoji semantics changes over time, applying techniques from computational linguistics to six years of Twitter data. We identify five patterns in emoji semantic development and find evidence that the less abstract an emoji is,...
Preprint
Full-text available
Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental an...
Preprint
Full-text available
Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word's correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Greek and L...
Conference Paper
Full-text available
Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders prog...
Article
Full-text available
This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the cltk lemmatizer, of the cltk backoff lemmatizer, and of glem, together with the lemmatizatio...
Article
Full-text available
Traditional philological methods in Roman legal scholarship such as close reading and strict juristic reasoning have analysed law in extraordinary detail. Such methods, however, have paid less attention to the empirical characteristics of legal texts and occasionally projected an abstract framework onto the sources. The paper presents a series of c...
Chapter
Full-text available
This chapter presents an overview of the state of the art in the analysis of semantics phenomena in historical texts at scale, highlighting its critical aspects and proposing a new approach which joins together the expertise of computational specialists with that of humanities scholars. Semantic phenomena are grounded in linguistic, cognitive, soci...
Preprint
Full-text available
Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders prog...
Chapter
This chapter explains the concept of frequency, as well as various types of frequencies that can be measured in a text or in a collection of texts. Raw frequency and relative frequency are explained using the example of two short poems by the American poet Emily Dickinson, which demonstrates how frequency can be used to study the extent to which ce...
Chapter
Full-text available
This chapter explains the concept of collocation, as well as various metrics to identify collocations in a text. Collocations are sequences of strongly related words in a text; there is often an associative relationship between terms that form collocations. The chapter points out the relevance of collocation analysis for humanities through the conc...
Chapter
This chapter introduces quantitative methods for the analysis of word meaning, covering vector space models and the main concepts of distributional semantics. It presents a series of case studies illustrating the application of these techniques to real-world research questions, including analysis of Medical Officer of Health reports for London and...
Chapter
This concluding focuses on some of the bridging concepts that help us translate qualitative research problems in the humanities into quantitative research goals to be addressed by language technology methods. Drawing on the use cases presented in the previous chapters, the authors point out why and how these bridging concepts can generate new insig...
Chapter
This chapter outlines the relevance of language technology for the exploration and study of big textual data sets in the humanities. We also discuss the importance of understanding the logic underlying the use of language technology to resolve research problems in the humanities. Finally, we outline the three pillars of the approach we follow throu...
Chapter
This chapter guides the reader through the key stages of creating language resources. After explaining the difference between linguistic corpora and other text collections, the authors briefly introduce the typology of corpora created by corpus linguists and the concept of corpus annotation. Basic terminology from natural language processing (NLP)...
Chapter
This chapter introduces the representation of texts as elements of feature spaces, as well as various exploratory tools to study such representations. It investigates how students of humanities can discover groups of topically similar texts in a large textual collection and how recurring themes giving rise to similarity can be detected. Concepts in...
Article
In spite of the increasingly large textual datasets humanities researchers are confronted with, and the need for automatic tools to extract information from them, we observe a lack of communication and diverging goals between the communities of Natural Language Processing (NLP) and Digital Humanities (DH). This contrasts with the wealth of potentia...
Article
Full-text available
Language is a complex and dynamic system. If we consider word meaning, which is the scope of lexical semantics, we observe that some words have several meanings, thus displaying lexical polysemy. In this article, we present the first phase of a project that aims at computationally modelling Ancient Greek semantics over time. Our system is based on...
Preprint
Full-text available
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first...
Preprint
Full-text available
As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, mo...
Conference Paper
Full-text available
The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word em...
Article
Full-text available
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these stat...
Article
Full-text available
Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond pred...
Conference Paper
Full-text available
A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR). Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text. The impact of these errors on natural...
Book
“McGillivray and Tóth provide a very comprehensible introduction to the most important current approaches of computer-aided text analysis in the Digital Humanities. By giving illustrative examples and many practical tips, they let the reader participate in their vast experience in this quickly evolving field of research.”--Gregor Wiedemann, Univers...
Article
Full-text available
Our paper describes the creation and evaluation of a Distributional Semantics model of ancient Greek. We developed a vector space model where every word is represented by a vector which encodes information about its linguistic context(s). We validate different vector space models by testing their output against benchmarks obtained from scholarship...
Article
Full-text available
How do the level of usage of an article, the timeframe of its usage and its subject area relate to the number of citations it accrues? This paper aims to answer this question through an observational study of usage and citation data collected about the multidisciplinary, open access mega-journal Scientific Reports. This observational study answers...
Conference Paper
Full-text available
Semantic change detection (i.e., identifying words whose meaning has changed over time) started emerging as a growing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social science. However, several obstacles make progress in the domain slow and d...
Article
Full-text available
The dataset covers the so-called “dative alternation”. The dative alternation (also referred to as the ditransitive or double-object construction) refers to parallel constructions that have broadly similar meaning but different syntax: i. he gave it to the board” ii. “I gave her my old one” In i., the verb “give” takes a noun phrase (the pronoun...
Preprint
Full-text available
Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word's correct meaning in its historical context is a critical challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change hav...
Preprint
Full-text available
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these stat...
Article
Full-text available
Natural Language Understanding (NLU) systems are essential components in many industry conversational Artificial Intelligence applications. There are strong incentives to develop a good NLU capability in such systems, both to improve the user experience, and in the case of regulated industries for compliance reasons. We report on a series of experi...
Preprint
Full-text available
How does usage of an article relate to the number of citations it accrues? Does the timeframe in which an article is used (and how much that article is used) have an effect on when and how much that article is cited? What role does an article's subject area play in the relationship between usage and citations? This paper aims to answer these questi...
Conference Paper
The ever-expanding wealth of digital material that researchers have at their disposal today, coupled with growing computing power, makes the use of quantitative methods in historical disciplines in- creasingly more viable. However, applying exist- ing techniques and tools to historical datasets is not a trivial enterprise (Piotrowski, 2012; McGilli...
Conference Paper
Full-text available
Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word’s correct meaning in its historical context is a central challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have...
Conference Paper
Full-text available
In this paper we present a methodology based on distributional semantic models that can be flexibly adapted to the specific challenges posed by historical texts and that allow users to retrieve semantically relevant text without the need to close-read the documents. We focus on a case study concerned with detecting smell-related sentences in histor...
Article
Full-text available
Related data set “Diorisis Ancient Greek Corpus” with DOI https://www.doi.org/10.6084/m9.figshare.6187256 in repository “figshare”. The Diorisis Ancient Greek Corpus is a digital collection of ancient Greek texts (from Homer to the early fifth century AD) compiled for linguistic analyses, and specifically with the purpose of developing a computatio...
Chapter
Full-text available
Detecting significant linguistic shifts in the meaning and usage of words has gained more attention over the last few years. Linguistic shifts are especially prevalent on the Internet, where words’ meaning can change rapidly. In this work, we describe the construction of a large diachronic corpus that relies on the UK Web Archive and we propose a p...
Article
Full-text available
The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd...
Preprint
Full-text available
Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond pred...
Article
Full-text available
Abstract Background Double-blind peer review has been proposed as a possible solution to avoid implicit referee bias in academic publishing. The aims of this study are to analyse the demographics of corresponding authors choosing double-blind peer review and to identify differences in the editorial outcome of manuscripts depending on their review m...
Conference Paper
Full-text available
We have created a Massive Open Online Course (MOOC) about dictionaries and dictionary-making, to be hosted by FutureLearn. This paper discusses the design and development of this course, which is pitched at high school and undergraduate level participants as well as language enthusiasts around the world. The MOOC will answer questions such as: how...
Conference Paper
Full-text available
We have created a Massive Open Online Course (MOOC) about dictionaries and dictionary-making, to be hosted by FutureLearn. This paper discusses the design and development of this course, which is pitched at high school and undergraduate level participants as well as language enthusiasts around the world. The MOOC will answer questions such as: how...
Chapter
Full-text available
A well-known feature of English grammar is the dative alternation, whereby a verb may be used in a V-NP-NP construction (Give me the money) or with a prepositional phrase in the pattern V-NP-PP, typically with the preposition to (Give the money to me). In this study, we use data from the Early-Access Subset (EAS) of the Spoken British National Corp...
Article
Full-text available
The Internet facilitates large-scale collaborative projects. The emergence of Web~2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-ba...
Preprint
Full-text available
Double-blind peer review has been proposed as a possible solution to avoid implicit referee bias in academic publishing. The aims of this study are to analyse the demographics of corresponding authors choosing double blind peer review, and to identify differences in the editorial outcome of manuscripts depending on their review model. Data includes...
Article
Full-text available
The present report summarizes an exploratory study which we carried out in the context of the COST Action IS1310 "Reassembling the Republic of Letters, 1500-1800", and which is relevant to the activities of Working Group 3 "Texts and Topics" and Working Group 2 "People and Networks". In this study we investigated the use of Natural Language Process...
Preprint
The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd...
Book
This book is an innovative guide to quantitative, corpus-based research in historical and diachronic linguistics. Gard B. Jenset and Barbara McGillivray argue that, although historical linguistics has been successful in using the comparative method, the field lags behind other branches of linguistics with respect to adopting quantitative methods. H...
Book
This book is an innovative guide to quantitative, corpus-based research in historical and diachronic linguistics. Gard B. Jenset and Barbara McGillivray argue that, although historical linguistics has been successful in using the comparative method, the field lags behind other branches of linguistics with respect to adopting quantitative methods. H...
Article
Full-text available
We have built a corpus-driven valency lexicon for Greek verbs by following an approach devised for Latin data. We have then used the lexicon to detect a specific type of potentially ambiguous syntactic patterns in Latin and Greek hexametric poetry, which can consistently be disambiguated by prosodic breaks. Such disambiguating breaks were then mapp...
Chapter
Cognitive linguistics has an honourable tradition of paying respect to naturally occurring language data and there have been fruitful interactions between corpus data and aspects of linguistic structure and meaning. More recently, dialect data and sociolinguistic data collection methods/theoretical concepts have started to generate interest. There...

Network

Cited By