Sophia Ananiadou

Sophia Ananiadou
The University of Manchester · School of Computer Science

Doctor of Philosophy

About

439
Publications
93,362
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,998
Citations

Publications

Publications (439)
Article
The evolution of the Exposome concept revolutionised the research in exposure assessment and epidemiology by introducing the need for a more holistic approach on the exploration of the relationship between the environment and disease. At the same time, further and more dramatic changes have also occurred on the working environment, adding to the al...
Article
Biomedical text summarization is a critical task for comprehension of an ever-growing amount of biomedical literature. Pre-trained language models (PLMs) with transformer-based architectures have been shown to greatly improve performance in biomedical text mining tasks. However, existing methods for text summarization generally fine-tune PLMs on th...
Article
Full-text available
Background Nested and overlapping events are particularly frequent and informative structures in biomedical event extraction. However, state-of-the-art neural models either neglect those structures during learning or use syntactic features and external tools to detect them. To overcome these limitations, this paper presents and compares two neural...
Article
Full-text available
Introduction: Suicide is a global health concern. Sociocultural factors have an impact on self-harm and suicide rates. In Pakistan, both self-harm and suicide are considered as criminal offence's and are condemned on both religious and social grounds. The proposed intervention 'Youth Culturally Adapted Manual Assisted Problem Solving Training (YCM...
Article
Full-text available
Stress and depression detection on social media aim at the analysis of stress and identification of depression tendency from social media posts, which provide assistance for the early detection of mental health conditions. Existing methods mainly model the mental states of the post speaker implicitly. They also lack the ability to mentalise for com...
Article
Full-text available
Mental illness is highly prevalent nowadays, constituting a major cause of distress in people's life with impact on society's health and well-being. Mental illness is a complex multi-factorial disease associated with individual risk factors and a variety of socioeconomic, clinical associations. In order to capture these complex associations express...
Preprint
Full-text available
Negation and uncertainty modeling are long-standing tasks in natural language processing. Linguistic theory postulates that expressions of negation and uncertainty are semantically independent from each other and the content they modify. However, previous works on representation learning do not explicitly model this independence. We therefore attem...
Article
Gender bias is an important problem that affects models of natural language, and the propagation of such biases could be harmful. Much research focuses on gender biases in word embeddings, and there are also some works on gender biases in subsequent tasks. However, very limited prior work has been done on gender issues in emotion detection tasks. I...
Preprint
Recently, Transformer model, which has achieved great success in many artificial intelligence fields, has demonstrated its great potential in modeling graph-structured data. Till now, a great variety of Transformers has been proposed to adapt to the graph-structured data. However, a comprehensive literature review and systematical evaluation of the...
Preprint
Click-Through Rate (CTR) prediction, is an essential component of online advertising. The mainstream techniques mostly focus on feature interaction or user interest modeling, which rely on users' directly interacted items. The performance of these methods are usally impeded by inactive behaviours and system's exposure, incurring that the features e...
Article
Full-text available
Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades 1,2 , the most dramatic advances in MR have followed in the wake of critical corpus development ³ . Large, well-annotated corpora have been associated with punctuated advances in MR methodology and aut...
Article
Large scale pre-trained language models (PLMs) have advanced state-of-the-art (SOTA) performance on various biomedical text mining tasks. The power of such PLMs can be combined with the advantages of deep generative models. These are examples of these combinations. However, they are trained only on general domain text, and biomedical models are sti...
Article
Full-text available
The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their vi...
Conference Paper
Full-text available
Research in Text Simplification (TS) has relied mostly on the Wikipedia-based datasets and the SARI evaluation metric, as the preferred means for creating and evaluating new simplification methods. Previous studies have pointed out the flaws of data evaluation resources, including incorrect alignment of simple/complex sentence pairs, sentences with...
Conference Paper
Full-text available
Predicting and understanding how various mental health conditions present online in textual social media data has become an increasingly popular task. The main aim of using this type of data lies in utilising its findings to prevent future harm as well as to provide help. In this paper, we describe our approach and findings in participating in sub-...
Article
Full-text available
Textual Emotion Recognition (TER) is an important task in Natural Language Processing (NLP), due to its high impact in real-world applications. Prior research has tackled the automatic classification of emotion expressions in text by maximising the probability of the correct emotion class using cross-entropy loss. However, this approach does not ac...
Preprint
Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments. Additionally, evaluation is usually performed by using metrics such as BLEU or SARI to compare system output t...
Preprint
Full-text available
To interpret the genetic profile present in a patient sample, it is necessary to know which mutations have important roles in the development of the corresponding cancer type. Named entity recognition is a core step in the text mining pipeline which facilitates mining valuable cancer information from the scientific literature. However, due to the s...
Article
Full-text available
Suicide is one of the leading causes of death worldwide. At the same time, the widespread use of social media has led to an increase in people posting their suicide notes online. Therefore, designing a learning model that can aid the detection of suicide notes online is of great importance. However, current methods cannot capture both local and glo...
Preprint
Full-text available
We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The late...
Article
Background A barrier to practicing evidence-based medicine is the rapidly increasing body of biomedical literature. Use of method terms to limit the search can help reduce the burden of screening articles for clinical relevance; however, such terms are limited by their partial dependence on indexing terms and usually produce low precision, especial...
Preprint
Full-text available
Semantic search engines, which integrate the output of text mining (TM) methods, can significantly increase the ease and efficiency of finding relevant documents and locating important information within them. We present a novel search engine for the construction industry, HSEarch (http://www.nactem.ac.uk/hse/), which uses TM methods to provide sem...
Chapter
Semantic search engines, which integrate the output of text mining (TM) methods, can significantly increase the ease and efficiency of finding relevant documents and locating important information within them. We present a novel search engine for the construction industry, HSEarch (http://www.nactem.ac.uk/hse/), which uses TM methods to provide sem...
Preprint
Full-text available
Emotion recognition (ER) is an important task in Natural Language Processing (NLP), due to its high impact in real-world applications from health and well-being to author profiling, consumer analysis and security. Current approaches to ER, mainly classify emotions independently without considering that emotions can co-exist. Such approaches overloo...
Preprint
Full-text available
Machine reading is essential for unlocking valuable knowledge contained in the millions of existing biomedical documents. Over the last two decades 1,2 , the most dramatic advances in machine-reading have followed in the wake of critical corpus development ³ . Large, well-annotated corpora have been associated with punctuated advances in machine re...
Article
Full-text available
Objective To determine how the representation of women’s health has changed in clinical studies over the course of 70 years. Design Observational study of 71 866 research articles published between 1948 and 2018 in The BMJ . Main outcome measures The incidence of women-specific health topics over time. General linear, additive and segmented regre...
Article
Full-text available
Motivation: Recent neural approaches on event extraction from text mainly focus on flat events in general domain, while there are less attempts to detect nested and overlapping events. These existing systems are built on given entities and they depend on external syntactic tools. Results: We propose an end-to-end neural nested event extraction m...
Article
Full-text available
Most deep language understanding models depend only on word representations, which are mainly based on language modelling derived from a large amount of raw text. These models encode distributional knowledge without considering syntactic information, although several studies have shown benefits of including such information. Therefore, we propose n...
Article
Full-text available
We present our approach for the identification of cited text spans in scientific literature, using pre-trained encoders (BERT) in combination with different neural networks. We further experiment to assess the impact of using these cited text spans as input in BERT-based extractive summarisation methods. Inspired and motivated by the CL-SciSumm sha...
Preprint
Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named e...
Article
Full-text available
Here we investigate the evolutionary dynamics of several kinds of modern cultural artefacts—pop music, novels, the clinical literature and cars—as well as a collection of organic populations. In contrast to the general belief that modern culture evolves very quickly, we show that rates of modern cultural evolution are comparable to those of many an...
Preprint
Full-text available
Background The amount of published in vivo studies and the speed researchers are publishing them make it virtually impossible to follow the recent development in the field. Systematic review emerged as a method to summarise and analyse the studies quantitatively and critically but it is often out-of-date due to its lengthy process. Method We invit...
Article
Full-text available
Sometimes the normal course of events is disrupted by a particularly swift and profound change. Historians have often referred to such changes as “revolutions”, and, though they have identified many of them, they have rarely supported their claims with statistical evidence. Here, we present a method to identify revolutions based on a measure of mul...
Article
Full-text available
Background: Clinical Named Entity Recognition is to find the name of diseases, body parts and other related terms from the given text. Because Chinese language is quite different with English language, the machine cannot simply get the graphical and phonetic information form Chinese characters. The method for Chinese should be different from that...
Article
Full-text available
Background: Machine learning can assist with multiple tasks during systematic reviews to facilitate the rapid retrieval of relevant references during screening and to identify and extract information relevant to the study characteristics, which include the PICO elements of patient/population, intervention, comparator, and outcomes. The latter requi...
Preprint
Full-text available
We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures. Unlike existing structured prediction tasks such as dependency parsing, the task targets to detect DAG structures, w...
Preprint
Full-text available
Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair repr...
Article
Full-text available
* Background Consisting of dictated free-text documents such as discharge summaries, medical narratives are widely used in medical natural language processing. Relationships between anatomical entities and human body parts are crucial for building medical text mining applications. To achieve this, we establish a mapping system consisting of a Wikip...
Article
Full-text available
Objective: Identification of drugs, associated medication entities, and interactions among them are crucial to prevent unwanted effects of drug therapy, known as adverse drug events. This article describes our participation to the n2c2 shared-task in extracting relations between medication-related entities in electronic health records. Materials...
Article
Full-text available
The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward ta...
Article
Full-text available
Objective: This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. Materials and methods: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (e...
Preprint
Full-text available
Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural networ...
Preprint
When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator's overall rel...
Article
Full-text available
Objectives Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to d...
Article
Full-text available
Objective: We seek to quantify the mortality risk associated with mentions of medical concepts in textual electronic health records (EHRs). Recognizing mentions of named entities of relevant types (eg, conditions, symptoms, laboratory tests or behaviors) in text is a well-researched task. However, determining the level of risk associated with them...
Preprint
Full-text available
The linguistic foundations of science and technology have relied on a range of terms many of which are borrowed from ancient languages, a known but little researched fact from a statistical perspective. Precise definitions and novel concepts are often crafted with those — frequently used — terms, yet their etymology from Greek or Latin might not al...
Article
Full-text available
The linguistic foundations of science and technology include many terms that have been borrowed from ancient languages. In the case of terms with origins in the Greek language, the modern meaning can often differ significantly from the original one. Here we use the PubMed database to demonstrate the prevalence of words of Greek origin in the langua...
Data
The 172 words with rich meaning (nouns, adjectives and verb forms) that appear in at least one million PubMed records. A part-of-speech tagger was used to classify the words under consideration (https://cst.dk/tools/index.php).
Data
A search string containing the 152 words with rich meaning and four or more characters that appear in at least one million PubMed records. The search that excludes the 15 Greek terms is generated automatically at the following URL: https://tinyurl.com/y7kflbcb
Data
The 243 words that appear in at least one million PubMed records.
Data
The 152 words with rich meaning and four or more characters that appear in at least one million PubMed records.
Preprint
Full-text available
We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In o...
Article
Full-text available
Background Species occurrence records are very important in the biodiversity domain. While several available corpora contain only annotations of species names or habitats and geographical locations, there is no consolidated corpus that covers all types of entities necessary for extracting species occurrence from biodiversity literature. In order to...
Data
Division for training named entity recognisers