Diana Maynard

Diana Maynard
The University of Sheffield | Sheffield · Department of Computer Science (Faculty of Engineering)

PhD

About

194
Publications
56,747
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,997
Citations

Publications

Publications (194)
Book
This book introduces core natural language processing (NLP) technologies to non-experts in an easily accessible way, as a series of building blocks that lead the user to understand key technologies, why they are required, and how to integrate them into Semantic Web applications. Natural language processing and Semantic Web technologies have differe...
Chapter
Full-text available
This chapter focuses on the status of the English language, primarily acting as a benchmark for the level of technological support that other European languages could receive (see Maynard et al. 2022; Ananiadou et al. 2012). While it is rather unlikely that any other European language will ever reach this level, due to the continuing development of...
Preprint
Full-text available
The standard paradigm for fake news detection mainly utilizes text information to model the truthfulness of news. However, the discourse of online fake news is typically subtle and it requires expert knowledge to use textual information to debunk fake news. Recently, studies focusing on multimodal fake news detection have outperformed text-only met...
Book
Full-text available
The UNESCO-supported publication features over 100 recommendations for action and practical new tools to help fight a global scourge that threatens journalists’ safety and poisons democratic discourse. The study, spanning three years and representing collaborative research in 15 countries, is the most geographically, linguistically, and ethnically...
Poster
Full-text available
The calculation of environmental impacts from recipes remains a barrier to effective uptake of sustainable diets. In our project, we use pilot digital humanities methods to explore digitised recipe texts from websites in English, Dutch and German. Using the natural language processing toolkit GATE [1], we have developed customised tools to automati...
Research
Full-text available
The United Nations Educational Scientific and Cultural Organization (UNESCO) has published research produced by the International Center for Journalists (ICFJ) as part of a major interdisciplinary study being led by ICFJ's research team. The research, the most comprehensive of its kind, shows that the disturbing trend of online violence - from doxx...
Article
Full-text available
The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help t...
Article
Full-text available
In this paper, we discuss the use of natural language processing (NLP) and artificial intelligence (AI) to analyse nutritional and sustainability aspects of recipes and food. We present the state of the art and some use cases, followed by a discussion of challenges. Our perspective on addressing these is that while they typically have a technical n...
Chapter
Traditionally, there has been a disconnect between custom-built applications used to solve real-world information extraction problems in industry, and automated learning-based approaches developed in academia. Despite approaches such as transfer-based learning, adapting these to more customised solutions where the task and data may be different, an...
Article
Full-text available
Understanding knowledge co-creation in key emerging areas of European research is critical for policy makers wishing to analyze impact and make strategic decisions. However, purely data-driven methods for characterising policy topics have limitations relating to the broad nature of such topics and the differences in language and topic structure bet...
Preprint
Full-text available
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation catego...
Article
Full-text available
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation catego...
Conference Paper
The commercial pressure on media has increasingly dominated the institutional rules of news media, and consequently, more and more sensational and dramatized frames and biases are in evidence in newspaper articles. Increased bias in the news media, which can result in misunderstanding and misuse of facts, leads to polarized opinions which can heavi...
Article
Full-text available
Sustainable Development Goal (SDG) indicator 16.10.1 proposes an important monitoring agenda for the global recording of a range of violations against journalists as a means to prevent attacks on the communicative functions of journalism. However, the need for extensive collection of data on violations against journalists raises a number of methodo...
Conference Paper
This paper describes the participation of team “bertha-von-suttner” in the SemEval2019 task 4 Hyperpartisan News Detection task. Our system1 uses sentence representations from averaged word embeddings generated from the pre-trained ELMo model with Convolutional Neural Networks and Batch Normalization for predicting hyperpartisan news. The final pre...
Conference Paper
Understanding knowledge co-creation in key emerging areas of European research is a critical issue for policy makers in order to analyse impact and make strategic decisions. However, current methods for characterising and visualising the field have limitations concerning the changing nature of research, the differences in language and topic structu...
Preprint
Full-text available
Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a "news gap". Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum...
Article
Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a "news gap". Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum...
Chapter
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current met...
Conference Paper
Full-text available
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current met...
Article
Better Life Index (BLI), the measure of well-being proposed by the OECD, contains many metrics, which enable it to include a detailed overview of the social, economic, and environmental performances of different countries. However, this also increases the difficulty in evaluating the big picture. In order to overcome this, many composite BLI proced...
Article
Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. Results show that abuse increased su...
Conference Paper
Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. Results show that abuse increased su...
Article
Full-text available
Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. This collection allows us to look at...
Article
Full-text available
Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents...
Preprint
Full-text available
Automatic Term Extraction is a fundamental Natural Language Processing task often used in many knowledge acquisition processes. It is a challenging NLP task due to its high domain dependence: no existing methods can consistently outperform others in all domains, and good ATE is very much an unsolved problem. We propose a generic method for improvin...
Article
Full-text available
Changing people’s behaviour with regards to energy consumption is often regarded as key to mitigating climate change. To this end, endless campaigns have been run by governments and environmental organisations to engage and raise awareness of the public, and to promote behaviour change. Nowadays, many such campaigns expand to social media, in the h...
Conference Paper
Full-text available
This paper describes ongoing work in the KNOWMAK project, which aims to develop a webbased tool providing interactive visualisations and state-of-the-art indicators on knowledge cocreation in the European research area. One of the main novel developments in this work is the use of ontologies to act as a bridge between the data sources (research pro...
Article
This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, su...
Book
The two volumes LNCS 10249 and 10250 constitute the refereed proceedings of the 14th International Semantic Web Conference, ESWC 2017, held in Portorož, Slovenia. The 51 revised full papers presented were carefully reviewed and selected from 183 submissions. In addition, 10 PhD papers are included, selected out of 14 submissions. The papers are or...
Book
The two volumes LNCS 10249 and 10250 constitute the refereed proceedings of the 14th International Semantic Web Conference, ESWC 2017, held in Portorož, Slovenia. The 51 revised full papers presented were carefully reviewed and selected from 183 submissions. In addition, 10 PhD papers are included, selected out of 14 submissions. The papers are or...
Chapter
Semantic annotations have diverse applications, such as semantic search—finding documents that mention one or more concepts/instances from an ontology/linked open data; constructing social semantic user models, including demographics, user interests, and online behavior; modeling online communities; and semantic-based information visualization. All...
Chapter
Having determined which expressions in text are mentions of entities, a follow-up task is entity linking (or entity disambiguation) [111]. It typically requires annotating a potentially ambiguous entity mention in a document (e.g., Paris) with a link to a canonical identifier describing a unique entity in a database or an ontology (e.g., http://dbp...
Chapter
The widespread adoption of social media is based on tapping into the social nature of human interactions, by making it possible for people to voice their opinion, become part of a virtual community and collaborate remotely. If we take micro-blogging as an example, Twitter has over 300 million active users, posting millions of tweets daily.1
Article
Full-text available
Extracting information from Web pages for populating large, cross-domain knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise, and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to...
Conference Paper
Full-text available
While individual behaviour change is considered a central strategy to mitigate climate change, public engagement is still limited. Aiming to raise awareness, and to promote behaviour change, governments and organisations are conducting multiple pro-environmental campaigns, particularly via social media. However, to the best of our knowledge, these...
Conference Paper
This paper discusses the challenges in carrying out fair comparative evaluations of sentiment analysis systems. Firstly, these are due to differences in corpus annotation guidelines and sentiment class distribution. Secondly, different systems often make different assumptions about how to interpret certain statements, e.g. tweets with URLs. In orde...
Conference Paper
Social media has become the current trend in information technology industry that acts as the collective of online communication channels for community-based input, interaction, content-sharing and collaboration, used all around the world. Microblogging platforms such as Facebook and Twitter are fast communication channels for information sharing a...
Conference Paper
This paper describes an open source framework for analysing large volume social media content, which comprises semantic annotation, Linked Open Data, semantic search, dynamic result aggregation, and information visualisation. In particular, exploratory search and sense-making are supported through information visualisation interfaces, such as co-oc...
Article
This paper describes the approach we take to the analysis of social media, combining opinion mining from text and multimedia (images, videos, etc.), and centred on entity and event recognition. We examine a particular use case, which is to help archivists select material for inclusion in an archive of social media for preserving community memories,...
Chapter
Full-text available
Connectivity and relatedness of Web resources are two concepts that define to what extent different parts are connected or related to one another. Measuring connectivity and relatedness between Web resources is a growing field of research, often the starting point of recommender systems. Although relatedness is liable to subjective interpretations,...
Conference Paper
Full-text available
Distantly supervised approaches have be-come popular in recent years as they allowtraining relation extractors without text-bound annotation, using instead knownrelations from a knowledge base and alarge textual corpus from an appropri-ate domain. While state of the art dis-tant supervision approaches use off-the-shelf named entity recognition and...
Conference Paper
Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction...
Article
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction fr...
Article
Full-text available
The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It provides important and crucial background information, like reactions to political events and comments made by the general public. The case study presented in thi...
Article
Full-text available
In this paper, we describe a set of reusable text processing components for extracting opinionated information from social media, rating it for interestingness, and for detecting opinion events. We have developed applications in GATE to extract named entities, terms and events and to detect opinions about them, which are then used as the starting p...
Conference Paper
Full-text available
Sarcasm is a common phenomenon in social media, and is inherently difficult to analyse, not just automatically but often for humans too. It has an important effect on sentiment, but is usually ignored in social media analysis, because it is considered too tricky to handle. While there exist a few systems which can detect sarcasm, almost no work has...
Article
This chapter provides a high-level overview of the various NLP processes typically required for an ontology learning system,ranging from low-level linguistic pre-processing, through parsing, term recognition and information extraction. Since ontology learning research tends to reuse many existing NLP tools, this chapter also discusses some of the m...
Conference Paper
This paper describes the approach we take to the analysis of social media, combining opinion mining from text and multimedia (images, videos, etc), and centred on entity and event recognition. We examine a particular use case, which is to help archivists select mater- ial for inclusion in an archive of social media for preserving community memories...
Conference Paper
Full-text available
Connectivity and relatedness of Web resources are two concepts that define to what extent different parts are connected or related to one another. Measuring connectivity and relatedness between Web resources is a growing field of research, often the starting point of recommender systems. Although relatedness is liable to subjective interpretations,...
Conference Paper
Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Semantic annotation of tweets is typically performed...
Article
Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP...
Article
We present ONTOCOM, a method to estimate the costs of ontology engineering, as well as project management tools that support the application of the method. ONTOCOM is part of a broader framework we have developed over the five years, whose aim is to ...
Conference Paper
Full-text available
With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time be-comes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the infor-mation objects related to them. Therefore, the crucia...
Article
Full-text available
This paper describes a tool developed to improve access to the enormous volume of data housed at the UK's National Archives, both for the general public and for specialist researchers. The system we have developed, TNA-Search, enables a multi-paradigm search over the entire electronic archive (42TB of data in various formats). The search functional...
Article
Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale
Article
Full-text available
While much work has recently focused on the analysis of social media in order to get a feel for what people think about current topics of interest, there are, however, still many challenges to be faced. Text mining systems originally designed for more regular kinds of texts such as news articles may need to be adapted to deal with facebook posts, t...
Article
This paper studies concept drift over time. We first define the meaning of a concept in terms of intension, extension and label. Then we study concept drift over time using two theories: one based on concept identity and one based on concept morphing. ...
Conference Paper
Full-text available
In this paper, we discuss a variety of issues related to opinion mining from microposts, and the challenges they impose on an NLP system, along with an example application we have developed to determine political leanings from a set of pre-election tweets. While there are a number of sentiment analysis tools available which summarise positive, nega...
Article
With the rapidly growing volume of resources on the Web, Web archiving becomes an important challenge. In addition, the notion of community memories extends traditional Web archives with related data from a variety of sources on the Social Web. Community memories take an entity-centric view to organise Web content according to the events and the en...
Article
Full-text available
This paper discusses and explores the main issues for evaluating ontology-based annotation tools, a key component in text mining applications for the Semantic Web. Semantic annotation and ontology-based information extraction technologies form the cornerstone of such applications. There has been a great deal of work in the last decade on evaluating...
Conference Paper
This book constitutes the refereed proceedings of the 8th International Semantic Web Conference, ISWC 2009, held in Chantilly, VA, USA, during October 25-29, 2009. The volume contains 43 revised full research papers selected from a total of 250 submissions; 15 papers out of 59 submissions to the semantic Web in-use track, and 7 papers and 12 poste...
Conference Paper
Full-text available
According to recent surveys, information workers send and receive an average of 133 messages per day, and users talk about "living" in email, spending an average of 21 percent of their time on it, as well as reporting general problems with overload. Information created by a business can represent either an asset or a liability, depending largely on...