Barbara Mcgillivray

Barbara Mcgillivray
King's College London | KCL · Department of Digital Humanities

PhD

About

94
Publications
15,337
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
755
Citations
Introduction
My background covers Mathematics (Algebraic geometry in particular) and Classics, via Computational Linguistics (specifically, Latin Computational Linguistics). I am a research fellow at The Alan Turing Institute and the University of Cambridge.
Additional affiliations
October 2019 - present
Journal of Open Humanities Data
Position
  • Editor
August 2009 - May 2010
University of Bergen
Position
  • Research project
September 2008 - March 2009
Catholic University of the Sacred Heart
Position
  • Research project
Education
January 2007 - December 2010
Università di Pisa
Field of study
  • Computational Linguistics
October 2004 - May 2006
University of Florence
Field of study
  • Classics
September 1999 - April 2004
University of Florence
Field of study
  • Mathematics

Publications

Publications (94)
Preprint
Full-text available
Distributional semantics, the quantitative study of meaning variation and change through corpus collocations, is currently one of the most productive research areas in computational linguistics. The wider availability of big data and of reproducible algorithms for analysis has boosted its application to living languages in recent years. But can we...
Article
Full-text available
Multi-disciplinary and inter-disciplinary collaboration can be an appropriate response to tackling the increasingly complex problems faced by today’s society. Scientific disciplines are not rigidly defined entities and their profiles change over time. No previous study has investigated multiple disciplinarity (i.e. the complex interaction between d...
Conference Paper
Full-text available
Abstract for long paper, DH Benelux 2022: RE-MIX. Creation and alteration in DH (Hybrid), 1-3 June 2022. Research in computational linguistics has made successful attempts at modelling word meaning at scale, but much remains to be done to put these computational models to the test of historical scholarship (see e.g. Beelen et al. 2021). More impor...
Article
Full-text available
This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST A...
Article
We present a new corpus-based resource and methodology for the annotation of Latin lexical semantics, consisting of 2,399 annotated passages of 40 lemmas from the Latin diachronic corpus LatinISE. We also describe how the annotation was designed, analyse annotators’ styles, and present the preliminary results of a study on the lexical semantics and...
Book
Full-text available
This handbook aims to support higher education institutions with the integration of FAIR-related content in their curricula and teaching. It was written and edited by a group of about 40 collaborators in a series of six book sprint events that took place between 1 and 10 June 2021. The document provides practical material, such as competence profil...
Article
Full-text available
Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, g...
Chapter
Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word with its correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Gree...
Conference Paper
The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD...
Conference Paper
Full-text available
As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TS...
Preprint
Over time, new words enter the language, others become obsolete, and existing words acquire new meanings. The recent digitization efforts have now made it possible to access and mine digital collections of historical texts using automatic methods and investigate the question of semantic change over centuries. Easy access to very large born-digital...
Preprint
Full-text available
Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, g...
Preprint
Full-text available
The semantics of emoji has, to date, been considered from a static perspective. We offer the first longitudinal study of how emoji semantics changes over time, applying techniques from computational linguistics to six years of Twitter data. We identify five patterns in emoji semantic development and find evidence that the less abstract an emoji is,...
Preprint
Full-text available
Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental an...
Preprint
Full-text available
Change and its precondition, variation, are inherent in languages. Over time, new words enter the lexicon, others become obsolete, and existing words acquire new senses. Associating a word's correct meaning in its historical context is a central challenge in diachronic research. Historical corpora of classical languages, such as Ancient Greek and L...
Conference Paper
Full-text available
Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders prog...
Article
Full-text available
This article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the cltk lemmatizer, of the cltk backoff lemmatizer, and of glem, together with the lemmatizatio...
Article
Full-text available
Traditional philological methods in Roman legal scholarship such as close reading and strict juristic reasoning have analysed law in extraordinary detail. Such methods, however, have paid less attention to the empirical characteristics of legal texts and occasionally projected an abstract framework onto the sources. The paper presents a series of c...
Chapter
Full-text available
This chapter presents an overview of the state of the art in the analysis of semantics phenomena in historical texts at scale, highlighting its critical aspects and proposing a new approach which joins together the expertise of computational specialists with that of humanities scholars. Semantic phenomena are grounded in linguistic, cognitive, soci...
Preprint
Full-text available
Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders prog...
Chapter
This chapter explains the concept of frequency, as well as various types of frequencies that can be measured in a text or in a collection of texts. Raw frequency and relative frequency are explained using the example of two short poems by the American poet Emily Dickinson, which demonstrates how frequency can be used to study the extent to which ce...
Chapter
Full-text available
This chapter explains the concept of collocation, as well as various metrics to identify collocations in a text. Collocations are sequences of strongly related words in a text; there is often an associative relationship between terms that form collocations. The chapter points out the relevance of collocation analysis for humanities through the conc...
Chapter
This chapter introduces quantitative methods for the analysis of word meaning, covering vector space models and the main concepts of distributional semantics. It presents a series of case studies illustrating the application of these techniques to real-world research questions, including analysis of Medical Officer of Health reports for London and...
Chapter
This concluding focuses on some of the bridging concepts that help us translate qualitative research problems in the humanities into quantitative research goals to be addressed by language technology methods. Drawing on the use cases presented in the previous chapters, the authors point out why and how these bridging concepts can generate new insig...
Chapter
This chapter outlines the relevance of language technology for the exploration and study of big textual data sets in the humanities. We also discuss the importance of understanding the logic underlying the use of language technology to resolve research problems in the humanities. Finally, we outline the three pillars of the approach we follow throu...
Chapter
This chapter guides the reader through the key stages of creating language resources. After explaining the difference between linguistic corpora and other text collections, the authors briefly introduce the typology of corpora created by corpus linguists and the concept of corpus annotation. Basic terminology from natural language processing (NLP)...
Chapter
This chapter introduces the representation of texts as elements of feature spaces, as well as various exploratory tools to study such representations. It investigates how students of humanities can discover groups of topically similar texts in a large textual collection and how recurring themes giving rise to similarity can be detected. Concepts in...
Article
In spite of the increasingly large textual datasets humanities researchers are confronted with, and the need for automatic tools to extract information from them, we observe a lack of communication and diverging goals between the communities of Natural Language Processing (NLP) and Digital Humanities (DH). This contrasts with the wealth of potentia...
Article
Full-text available
Language is a complex and dynamic system. If we consider word meaning, which is the scope of lexical semantics, we observe that some words have several meanings, thus displaying lexical polysemy. In this article, we present the first phase of a project that aims at computationally modelling Ancient Greek semantics over time. Our system is based on...
Preprint
Full-text available
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first...
Preprint
Full-text available
As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, mo...
Conference Paper
Full-text available
The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word em...
Article
Full-text available
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these stat...
Article
Full-text available
Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond pred...
Conference Paper
Full-text available
A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR). Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text. The impact of these errors on natural...
Book
“McGillivray and Tóth provide a very comprehensible introduction to the most important current approaches of computer-aided text analysis in the Digital Humanities. By giving illustrative examples and many practical tips, they let the reader participate in their vast experience in this quickly evolving field of research.”--Gregor Wiedemann, Univers...
Article
Our paper describes the creation and evaluation of a Distributional Semantics model of ancient Greek. We developed a vector space model where every word is represented by a vector which encodes information about its linguistic context(s). We validate different vector space models by testing their output against benchmarks obtained from scholarship...
Article
Full-text available
How do the level of usage of an article, the timeframe of its usage and its subject area relate to the number of citations it accrues? This paper aims to answer this question through an observational study of usage and citation data collected about the multidisciplinary, open access mega-journal Scientific Reports. This observational study answers...
Conference Paper
Full-text available
Semantic change detection (i.e., identifying words whose meaning has changed over time) started emerging as a growing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social science. However, several obstacles make progress in the domain slow and d...
Article
Full-text available
The dataset covers the so-called “dative alternation”. The dative alternation (also referred to as the ditransitive or double-object construction) refers to parallel constructions that have broadly similar meaning but different syntax: i. he gave it to the board” ii. “I gave her my old one” In i., the verb “give” takes a noun phrase (the pronoun...
Preprint
Full-text available
Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word's correct meaning in its historical context is a critical challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change hav...
Preprint
Full-text available
Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these stat...
Article
Full-text available
Natural Language Understanding (NLU) systems are essential components in many industry conversational Artificial Intelligence applications. There are strong incentives to develop a good NLU capability in such systems, both to improve the user experience, and in the case of regulated industries for compliance reasons. We report on a series of experi...
Preprint
Full-text available
How does usage of an article relate to the number of citations it accrues? Does the timeframe in which an article is used (and how much that article is used) have an effect on when and how much that article is cited? What role does an article's subject area play in the relationship between usage and citations? This paper aims to answer these questi...
Conference Paper
The ever-expanding wealth of digital material that researchers have at their disposal today, coupled with growing computing power, makes the use of quantitative methods in historical disciplines in- creasingly more viable. However, applying exist- ing techniques and tools to historical datasets is not a trivial enterprise (Piotrowski, 2012; McGilli...
Conference Paper
Full-text available
Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word’s correct meaning in its historical context is a central challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have...
Conference Paper
Full-text available
In this paper we present a methodology based on distributional semantic models that can be flexibly adapted to the specific challenges posed by historical texts and that allow users to retrieve semantically relevant text without the need to close-read the documents. We focus on a case study concerned with detecting smell-related sentences in histor...
Article
Full-text available
Related data set “Diorisis Ancient Greek Corpus” with DOI https://www.doi.org/10.6084/m9.figshare.6187256 in repository “figshare”. The Diorisis Ancient Greek Corpus is a digital collection of ancient Greek texts (from Homer to the early fifth century AD) compiled for linguistic analyses, and specifically with the purpose of developing a computatio...
Article
The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd...
Preprint
Full-text available
Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond pred...
Article
Full-text available
Background Double-blind peer review has been proposed as a possible solution to avoid implicit referee bias in academic publishing. The aims of this study are to analyse the demographics of corresponding authors choosing double-blind peer review and to identify differences in the editorial outcome of manuscripts depending on their review model. Me...
Conference Paper
Full-text available
We have created a Massive Open Online Course (MOOC) about dictionaries and dictionary-making, to be hosted by FutureLearn. This paper discusses the design and development of this course, which is pitched at high school and undergraduate level participants as well as language enthusiasts around the world. The MOOC will answer questions such as: how...
Conference Paper
Full-text available
We have created a Massive Open Online Course (MOOC) about dictionaries and dictionary-making, to be hosted by FutureLearn. This paper discusses the design and development of this course, which is pitched at high school and undergraduate level participants as well as language enthusiasts around the world. The MOOC will answer questions such as: how...
Chapter
Full-text available
A well-known feature of English grammar is the dative alternation, whereby a verb may be used in a V-NP-NP construction (Give me the money) or with a prepositional phrase in the pattern V-NP-PP, typically with the preposition to (Give the money to me). In this study, we use data from the Early-Access Subset (EAS) of the Spoken British National Corp...
Article
Full-text available
The Internet facilitates large-scale collaborative projects. The emergence of Web~2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-ba...
Article
Full-text available
The present report summarizes an exploratory study which we carried out in the context of the COST Action IS1310 "Reassembling the Republic of Letters, 1500-1800", and which is relevant to the activities of Working Group 3 "Texts and Topics" and Working Group 2 "People and Networks". In this study we investigated the use of Natural Language Process...