About
55
Publications
9,865
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
485
Citations
Introduction
My research interests lie in the intersection of Natural Language Processing, Knowledge Organisation and Representation, and the Semantic Web. I am particular interested in information extraction, automatic metadata generation and semantic annotation with respect to conceptual reference models, ontologies and knowledge base resources. My research integrates innovations from the intersection of Computer Science and Information Science with the Humanities, addressing an ever-increasing need to develop tools and services capable of turning previously inaccessible text into conceptualised data. My research combines text mining and knowledge representation methods for extracting and representing knowledge currently "locked" in analogue or digitally inaccessible forms
Current institution
Additional affiliations
Education
September 2007 - July 2012
September 2003 - September 2004
September 1999 - June 2003
Publications
Publications (55)
Purpose
Advancements in Internet technologies greatly influence digital humanities, yet research investigating web3 (i.e. the blockchain-based, decentralised web) within that domain remains limited. The purpose of this paper is to address that gap, presenting a state-of-the-art synthesis of web3-related technologies for digital humanities infrastru...
We present our solution to the problem of how to mobilise (that is, extract and enrich) digital data from the analogue, printed book version Sir Hans Sloane’s copy of John Ray’s Historia Plantarum, to create the first searchable facility of its kind to the plants contained in the Sloane Herbarium, housed in the National History Museum UK. The data...
Given that AI systems are set to play a pivotal role in future decision-making processes, their trustworthiness and reliability are of critical concern. Due to their scale and complexity, modern AI systems resist direct interpretation, and alternative ways are needed to establish trust in those systems, and determine how well they align with human...
Purpose
This paper aims to explore the accelerations and constraints libraries, archives, museums and heritage organisations (“collections-holding organisations”) face in their role as collection data providers for digital infrastructures. To date, digital infrastructures operate within the cultural heritage domain typically as data aggregation pla...
XML "The Sloane Lab: Looking back to build future shared collections" is a 3-year project funded by the UKRI Towards a National Collection (TaNC) programme. The programme aims at breaking down barriers between diverse collections to create a unified virtual national collection that would open up the heritage preserved in the United Kingdom to a glo...
The founding collection of the British Museum is a rich area to explore how we can reconnect dispersed heritage connections using state of the art technologies. This is because the British Museum's original 1753 founding collection of Sir Hans Sloane is now split across three different institutions (the British Museum (BM), Natural History Museum (...
“The Sloane Lab: Looking back to build future shared collections” is a 3-year project funded by the UKRI Towards a National Collection programme. The project aims to re-establish connections between Sloane’s collections and catalogues and to mend the broken links between the past and present of the UK's founding collection in the catalogues of the...
Language technology is becoming increasingly important across a variety of application domains which have become common place in large, well-resourced languages. However, there is a danger that small, under-resourced languages are being increasingly pushed to the technological margins. Under-resourced languages face significant challenges in delive...
Purpose
By mapping-out the capabilities, challenges and limitations of named-entity recognition (NER), this article aims to synthesise the state of the art of NER in the context of the early modern research field and to inform discussions about the kind of resources, methods and directions that may be pursued to enrich the application of the techni...
Author Accepted Manuscript: Creative Commons Attribution Non-commercial International Licence 4.0 (CC BY-NC 4.0). Purpose: Named Entity Recognition (NER) can enhance the (re)search capabilities of digitised documents and infrastructure; it can also open new possibilities for the interlinking of digitised documents with wider knowledge domains and r...
The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports ge...
The advancement of Natural Language Processing (NLP) allows the process of deriving information from large volumes of text to be automated, making text-based resources more discoverable and useful. The attention is turned to one of the most important, but traditionally difficult to access resources in archaeology; the largely unpublished reports ge...
This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains...
Recent work and publications concerning sustainable water stewardship in Rajasthan (India) highlight how contemporary challenges are eroding traditional, communal approaches to water stewardship through mechanised extraction beyond the renewable capacities of ecosystems. Our work is focused on developing a formal ontology for modelling the knowledg...
Recent work and publications concerning sustainable water stewardship in Rajasthan (India) highlight how contemporary challenges are eroding traditional, communal approaches to water stewardship through mechanised extraction beyond the renewable capacities of ecosystems. Our work is focused on developing a formal ontology for modelling the knowledg...
We present semantics-based mechanisms that aim to promote reflection on cultural heritage by means of dates (historical events or annual commemorations), owing to their connections to a collection of items and to the visitors’ interests. We argue that links to specific dates can trigger curiosity, increase retention and guide visitors around the ve...
The largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature” contain a great deal of untapped information, highly relevant to the research and analysis of archaeological evidence. The presentation unfolds experiences and challenges in using Natural Language Processing techniques for "unlocking...
The largely unpublished reports generated by commercial or “rescue” archaeology, commonly known as “grey literature” contain a great deal of untapped information, highly relevant to the research and analysis of archaeological evidence. The presentation unfolds experiences and challenges in using Natural Language Processing techniques for "unlocking...
Recent advances in semantic web and deep learning technologies enable new means for the computational analysis of vast amounts of information from the field of digital humanities. We discuss how some of the techniques can be used to identify historical and cultural symmetries between different characters, locations, events or venues, and how these...
This paper describes a working example of semantically modelling cultural heritage information and data from the National Gallery collection in London. The paper discusses the process of semantically representing and enriching the available cultural heritage data, and reveals the challenges of semantically expressing interrelations and groupings am...
This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, sampl...
The modern advances of digital technologies provide a wider access to information, enabling new ways of interacting with and understanding cultural heritage information, facilitating its presentation, access and reinterpretation. The paper presents a working example of connecting and mapping cultural heritage information and data from cultural heri...
This report presents the CrossCult digital datasets of the four project pilots.It contains a description of the methods and data structures used to semantically model and ingest the digital resources of the pilots into the CrossCult Knowledge Base following the semantics of the CrossCult Upper-level ontology, a set of examples of semantic enrichmen...
This paper presents the Upper-level Ontology and the other ontological schemas and vocabularies that we used to model the semantics of the “world” of CrossCult and its four pilots. It consists of two documents: a report describing the rationale and structure of the ontology and a PDF file containing the definitions of the classes and properties of...
The need for organising, sharing and digitally processing Cultural Heritage (CH) information has led to the development of formal knowledge representation models (ontologies) for the CH domain. Based on RDF and OWL, the standard data model and ontology language of the Semantic Web, ontologies such as CIDOC-CRM, the Europeana Data Model and VRA, off...
CrossCult is an EU-funded research project aiming to spur a change in the way European citizens appraise History, fostering the re-interpretation of what they may have learnt in the light of cross-border interconnections among pieces of cultural heritage, other citizens’ viewpoints and physical venues. Exploiting the expressive power, reasoning and...
Research e-infrastructures, digital archives, and data services have become important pillars of scientific enterprise that in recent decades have become ever more collaborative, distributed, and data intensive. The archaeological research community has been an early adopter of digital tools for data acquisition, organization, analysis, and present...
Research e-infrastructures, digital archives and data services have become important pillars of scientific enterprise that in recent decades has become ever more collaborative, distributed and data-intensive. The archaeological research community has been an early adopter of digital tools for data acquisition, organisation, analysis and presentatio...
This document is a deliverable (D16.4) of the ARIADNE project (“Advanced Research Infrastructure for Archaeological Dataset Networking in Europe”), which is funded under the European Community's Seventh Framework Programme. It presents the final results of the work carried out in Tasks 16.2 “Natural Language Processing (NLP)”. The report presents o...
The report presents a collaborative effort of the four pilots, which took place in the first six months (M1-M6) of the project and focused on: 1) refining the original scenarios, 2) capturing the requirements, 3) defining the evaluation framework, 4) identifying the contributing technologies, 5) specifying the core gameplay for the four pilots and...
Purpose
– The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in d...
The article presents a method for automatic semantic indexing of archaeological grey‐literature reports using empirical (rule‐based) Information Extraction techniques in combination with domain‐specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negati...
The Digital Research & Development Fund for the Arts in Wales is a partnership between Arts Council of Wales, the Arts & Humanities Research Council (AHRC) and Nesta. The Fund’s overarching purpose is “to enable the use of digital technologies in the arts sector to engage audiences in new ways and to create opportunities for new business models”. T...
Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertions. The paper presents a met...
The paper describes the use of Information Extraction (IE), a Natural Language Processing (NLP) technique to assist ‘rich’ semantic indexing of diverse archaeological text resources. Such unpublished online documents are often referred to as ‘Grey Literature’. Established document indexing techniques are not sufficient to satisfy user information n...
The paper discusses the application of Natural Language Processing (NLP) techniques in the context of semantic annotation of classical art text via rule-based Information Extraction (IE) techniques combined with ontological and domain vocabulary input. The CASIE (Classical Art Semantics Information Extraction) was a pilot collaborative project betw...
The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organizatio...
The volume of archaeological reports being produced since the introduction of PG161 has significantly increased, as a result of the increased volume of archaeological investigations conducted by academic and commercial archaeology. It is highly desirable to be able to search effectively within and across such reports in order to find information th...
This paper discusses the automatic generation of
rich metadata for semantic search of reports of archaeological
excavations. An extension of the CIDOC CRM for the
archaeological domain acts as a core ontology. This enables
cross search between diverse excavation datasets and ‘grey
literature’ excavation reports originating from the
Archaeological D...
The paper discusses the process of developing Semantic Annotations, a form of metadata for assigning conceptual entities to
textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language
Processing (NLP) technique is central to the annotation process. The paper explores the use of Ontology...
Open Access http://intarch.ac.uk/journal/issue30/tudhope_index.html
Differing terminology and database structure hinders meaningful cross search of excavation datasets. Matching free text grey literature reports with datasets poses yet more challenges. Conventional search techniques are unable to cross search between archaeological datasets and W...
Editor's Summary
Research data in archaeology is being made more accessible through the semantic efforts of the STAR and STELLAR projects of two United Kingdom universities. The goal of STAR (Semantic Technologies for Archaeological Resources) is to facilitate semantic interoperability, enabling a structured semantic search of five databases and gr...
Purpose
This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying...
Outcomes from the STAR Project are presented. The underlying rationale is the need to widen access to archaeological datasets, which will allow third parties to cross search different datasets and investigate the basis for interpretations in the underlying data. The semantic technologies employed are based on standard representations of domain voca...