
Derek GreeneUniversity College Dublin | UCD · School of Computer Science
Derek Greene
BA Mod, PhD
About
150
Publications
45,676
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,686
Citations
Introduction
Dr. Derek Greene is Assistant Professor at the UCD School of Computer Science, and a Funded Investigator at the Insight Centre for Data Analytics and the VistaMilk Research Centre. He has over 17 years' experience in the area of machine learning, with a PhD in Computer Science from Trinity College Dublin, and over 60 research papers presented at international conferences and published in journals. His current research focuses on the development of new methods for natural language processing and
Additional affiliations
August 2011 - present
Education
September 2003 - September 2006
October 1999 - May 2003
Publications
Publications (150)
The task of Query Performance Prediction (QPP) in Information Retrieval (IR) involves predicting the relative effectiveness of a search system for a given input query. Supervised approaches for QPP, such as NeuralQPP [23] are often trained on pairs of queries to capture their relative retrieval performance. However, point-wise approaches, such as t...
Paper presentation at ECIR-2022
Many recent deep learning-based solutions have widely adopted the attention-based mechanism in various tasks of the NLP discipline. However, the inherent characteristics of deep learning models and the flexibility of the attention mechanism increase the models' complexity, thus leading to challenges in model explainability. In this paper, to addres...
When studying large research corpora, “distant reading” methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that...
A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks...
This work proposes Field of Study networks as a novel network representation for use in scientometric analysis. We describe the formation of Field of Study (FoS) networks, which relate research topics according to the authors who publish in them, from corpora of articles where fields of study can be identified. FoS networks are particularly useful...
Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weak...
[This corrects the article DOI: 10.1057/s41599-021-00922-7.].
Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relyin...
The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has...
In recent years, there has been a rapidly expanding focus on explaining the predictions made by black-box AI systems that handle image and tabular data. However, considerably less attention has been paid to explaining the predictions of opaque AI systems handling time series data. In this paper, we advance a novel model-agnostic, case-based techniq...
The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has...
Traditional information retrieval systems are primarily fo-cused on finding topically-relevant documents, which are descriptive of a particular query concept. However, when working with sources such as collections of news articles, a user might often want to identify not only those documents which describe a news event, but also documents which exp...
Whilst an abundance of techniques have recently been proposed to generate counterfactual explanations for the predictions of opaque black-box systems, markedly less attention has been paid to exploring the uncertainty of these generated explanations. This becomes a critical issue in high-stakes scenarios, where uncertain and misleading explanations...
Recently, it has been proposed that fruitful synergies may exist between Deep Learning (DL) and Case Based Reasoning (CBR); that there are insights to be gained by applying CBR ideas to problems in DL (what could be called DeepCBR). In this paper, we report on a program of research that applies CBR solutions to the problem of Explainable AI (XAI) i...
This paper profiles the recent research work on eXplainable AI (XAI), at the Insight Centre for Data Analytics. This work concentrates on post-hoc explanation-by-example solutions to XAI as one approach to explaining black box deep-learning systems. Three different methods of post-hoc explanation are outlined for image and time-series datasets: tha...
In many real applications of semi-supervised learning, the guidance provided by a human oracle might be “noisy” or inaccurate. Human annotators will often be imperfect, in the sense that they can make subjective decisions, they might only have partial knowledge of the task at hand, or they may simply complete a labeling task incorrectly due to the...
In recent years there has been a cascade of research in attempting to make AI systems more interpretable by providing explanations; so-called Explainable AI (XAI). Most of this research has dealt with the challenges that arise in explaining black-box deep learning systems in classification and regression tasks, with a focus on tabular and image dat...
Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relyin...
Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address su...
Topic modeling is a popular unsupervised technique that is used to discover the latent thematic structure in text corpora. The evaluation of topic models typically involves measuring the semantic coherence of the terms describing each topic, where a single value is used to summarize the quality of an overall model. However, this can create difficul...
Algorithmic bias has the capacity to amplify and perpetuatesocietal bias, and presents profound ethical implications for society. Gen-der bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to ad-dress s...
The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the...
Many of the current approaches to automatic organ localisation in medical imaging require a large amount of labelled patient data to train systems to accurately identify specific anatomical features. Cross-Correlation, also known as template matching, is a statistical method of assessing the similarity between a template image and a target image. T...
A Conditional-Generative Adversarial Network has been used for a supervised image-to-image translation task which outputs a synthetic PET scan based on real patient CT data. The network is trained using only data of patients with healthy bone marrow metabolism. This allows for a patient specific synthetic healthy baseline scan to be produced. This...
In complex networks, we say that a network has community structure if subsets of its nodes form dense, highly-connected groups. Algorithms for detecting communities are generally unsupervised, relying solely on the network topology. However, such algorithms can often fail to uncover structure that reflects the underlying communities in the data, pa...
This article unveils the policy agenda of the European Central Bank (ECB) Governing Council as found in the speeches that Governing Council Members gave between 1999 and 2018. Using a dynamic topic‐modeling approached based on non‐negative matrix factorization, we demonstrate how the issues discussed by ECB Governing Council members have evolved ov...
We present an unsupervised explainable word embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a word using human-readable labels, thereby it readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph str...
Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness o...
he increasingly widespread use of Artificial Intelligence brings with it the potential for the generation and reinforcement of bias and discrimination in society. In a range of applications, from recognising people to recommending on-line content, bias has been identified in algorithms generated through machine learning. This paper examines how bia...
Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness o...
Meetup.com is a global online platform which facilitates the organisation of meetups in different parts of the world. A meetup group typically focuses on one specific topic of interest, such as sports, music, language, or technology. However, many users of this platform attend multiple meetups. On this basis, we can construct a co-membership networ...
Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices usin...
Populism, or at the very least a ‘populist zeitgeist’ has advanced across the globe with populist actors from across the ideological spectrum at the forefront of politics in Europe, North and South America and Southeast Asia. One of the major components is the media and specifically hybrid media, which can inhibit or magnify populist political tend...
Since 2013 researchers at University College Dublin in the Insight Centre for Data Analytics have been involved in a significant research programme in digital journalism, specifically targeting tools and social media guidelines to support the work of journalists. Most of this programme was undertaken in collaboration with The Irish Times. This coll...
Within the last decade, substantial advances have been made in the field of computational linguistics, due in part to the evolution of word embedding algorithms inspired by neural network models. These algorithms attempt to derive a set of vectors which represent the vocabulary of a textual corpus in a new embedded space. This new representation ca...
Approximately 10% of all haematologic cancers are related to Multiple Myeloma (MM). Whole-body 18F-FDG PETCT is an extremely useful imaging tool for the assessment of patients with MM. The software developed in this research performs a pixel thresholding based segmentation and a semi-automatic placement of regions of interest at key anatomical site...
Semi-supervised algorithms have been shown to improve the results of topic modeling when applied to unstructured text corpora. However, sufficient supervision is not always available. This paper proposes a new process, Weak+, suitable for use in semi-supervised topic modeling via matrix factorization, when limited supervision is available. This pro...
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initializat...
This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP spe...
Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of...
Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of...
We present TwitterCracy, an exploratory search system that allows users to search and monitor across the Twitter streams of political entities. Its exploratory capabilities stem from the application of lightweight time-series based clustering together with biased PageRank to extract facets from tweets and presenting them in a manner that facilitate...
Introduction
Multiple myeloma (MM) is a malignant hematologic disorder characterized by bone marrow infiltration with neoplastic plasma cells. Approximately 10% of all hematologic cancers are related to MM. Whole-body 18F-FDG PETCT is an extremely useful imaging tool for the assessment of patients with MM. The novel approach developed in this resea...
Virtual Learning Environments (VLE), such as Moodle, are purpose-built platforms in which teachers and students interact to exchange, review, and submit learning material and information. In this paper, we examine a complex VLE dataset from a large Irish university in an attempt to characterize student behavior with respect to deadlines and grades....
In this paper, we explore the design and effects of applying different sliding window methodologies to capture character co-occurrences within literature in order to build social networks. In particular, we focus our analysis on several works of 19th century fiction by Jane Austen and Charles Dickens. We define three different sliding window techni...
Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives. In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charl...
News media face many serious concerns as their distribution channels are gradually being taken over by third parties (e.g., people sharing news on Twitter and Facebook, and GoogleNews acting as a news aggregator). If traditional media is to survive at all, it needs to develop innovative strategies around these channels, to maximize audience engagem...
In this paper we address the problem of identifying attention dominating moments in on-line media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse...
In this paper we conduct an analysis of Moodle activity data focused on
identifying early predictors of good student performance. The analysis shows
that three relevant hypotheses are largely supported by the data. These
hypotheses are: early submission is a good sign, a high level of activity is
predictive of good results and evening activity is e...
Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member withdrawal. In this paper we borrow ideas from topic analysis to study editor activity on Wikipedia over time using latent space analysis, which offers an insight in...