Derek Greene

Derek Greene
University College Dublin | UCD · School of Computer Science

BA Mod, PhD

About

150
Publications
45,676
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,686
Citations
Introduction
Dr. Derek Greene is Assistant Professor at the UCD School of Computer Science, and a Funded Investigator at the Insight Centre for Data Analytics and the VistaMilk Research Centre. He has over 17 years' experience in the area of machine learning, with a PhD in Computer Science from Trinity College Dublin, and over 60 research papers presented at international conferences and published in journals. His current research focuses on the development of new methods for natural language processing and
Additional affiliations
August 2011 - present
University College Dublin
Position
  • Professor (Assistant)
Education
September 2003 - September 2006
Trinity College Dublin
Field of study
  • Computer Science
October 1999 - May 2003
Trinity College Dublin
Field of study
  • BA, Information and Communications Technology

Publications

Publications (150)
Conference Paper
Full-text available
The task of Query Performance Prediction (QPP) in Information Retrieval (IR) involves predicting the relative effectiveness of a search system for a given input query. Supervised approaches for QPP, such as NeuralQPP [23] are often trained on pairs of queries to capture their relative retrieval performance. However, point-wise approaches, such as t...
Preprint
Many recent deep learning-based solutions have widely adopted the attention-based mechanism in various tasks of the NLP discipline. However, the inherent characteristics of deep learning models and the flexibility of the attention mechanism increase the models' complexity, thus leading to challenges in model explainability. In this paper, to addres...
Preprint
Full-text available
When studying large research corpora, “distant reading” methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that...
Preprint
Full-text available
A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks...
Chapter
This work proposes Field of Study networks as a novel network representation for use in scientometric analysis. We describe the formation of Field of Study (FoS) networks, which relate research topics according to the authors who publish in them, from corpora of articles where fields of study can be identified. FoS networks are particularly useful...
Preprint
Full-text available
Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weak...
Article
Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relyin...
Article
Full-text available
The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has...
Chapter
Full-text available
In recent years, there has been a rapidly expanding focus on explaining the predictions made by black-box AI systems that handle image and tabular data. However, considerably less attention has been paid to explaining the predictions of opaque AI systems handling time series data. In this paper, we advance a novel model-agnostic, case-based techniq...
Preprint
Full-text available
The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has...
Conference Paper
Full-text available
Traditional information retrieval systems are primarily fo-cused on finding topically-relevant documents, which are descriptive of a particular query concept. However, when working with sources such as collections of news articles, a user might often want to identify not only those documents which describe a news event, but also documents which exp...
Preprint
Whilst an abundance of techniques have recently been proposed to generate counterfactual explanations for the predictions of opaque black-box systems, markedly less attention has been paid to exploring the uncertainty of these generated explanations. This becomes a critical issue in high-stakes scenarios, where uncertain and misleading explanations...
Preprint
Full-text available
Recently, it has been proposed that fruitful synergies may exist between Deep Learning (DL) and Case Based Reasoning (CBR); that there are insights to be gained by applying CBR ideas to problems in DL (what could be called DeepCBR). In this paper, we report on a program of research that applies CBR solutions to the problem of Explainable AI (XAI) i...
Chapter
This paper profiles the recent research work on eXplainable AI (XAI), at the Insight Centre for Data Analytics. This work concentrates on post-hoc explanation-by-example solutions to XAI as one approach to explaining black box deep-learning systems. Three different methods of post-hoc explanation are outlined for image and time-series datasets: tha...
Article
Full-text available
In many real applications of semi-supervised learning, the guidance provided by a human oracle might be “noisy” or inaccurate. Human annotators will often be imperfect, in the sense that they can make subjective decisions, they might only have partial knowledge of the task at hand, or they may simply complete a labeling task incorrectly due to the...
Preprint
In recent years there has been a cascade of research in attempting to make AI systems more interpretable by providing explanations; so-called Explainable AI (XAI). Most of this research has dealt with the challenges that arise in explaining black-box deep learning systems in classification and regression tasks, with a focus on tabular and image dat...
Preprint
Full-text available
Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relyin...
Chapter
Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address su...
Article
Topic modeling is a popular unsupervised technique that is used to discover the latent thematic structure in text corpora. The evaluation of topic models typically involves measuring the semantic coherence of the terms describing each topic, where a single value is used to summarize the quality of an overall model. However, this can create difficul...
Preprint
Full-text available
Algorithmic bias has the capacity to amplify and perpetuatesocietal bias, and presents profound ethical implications for society. Gen-der bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to ad-dress s...
Chapter
The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the...
Conference Paper
Full-text available
Many of the current approaches to automatic organ localisation in medical imaging require a large amount of labelled patient data to train systems to accurately identify specific anatomical features. Cross-Correlation, also known as template matching, is a statistical method of assessing the similarity between a template image and a target image. T...
Conference Paper
Full-text available
A Conditional-Generative Adversarial Network has been used for a supervised image-to-image translation task which outputs a synthetic PET scan based on real patient CT data. The network is trained using only data of patients with healthy bone marrow metabolism. This allows for a patient specific synthetic healthy baseline scan to be produced. This...
Article
Full-text available
In complex networks, we say that a network has community structure if subsets of its nodes form dense, highly-connected groups. Algorithms for detecting communities are generally unsupervised, relying solely on the network topology. However, such algorithms can often fail to uncover structure that reflects the underlying communities in the data, pa...
Article
This article unveils the policy agenda of the European Central Bank (ECB) Governing Council as found in the speeches that Governing Council Members gave between 1999 and 2018. Using a dynamic topic‐modeling approached based on non‐negative matrix factorization, we demonstrate how the issues discussed by ECB Governing Council members have evolved ov...
Article
Full-text available
We present an unsupervised explainable word embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a word using human-readable labels, thereby it readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph str...
Chapter
Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness o...
Poster
he increasingly widespread use of Artificial Intelligence brings with it the potential for the generation and reinforcement of bias and discrimination in society. In a range of applications, from recognising people to recommending on-line content, bias has been identified in algorithms generated through machine learning. This paper examines how bia...
Preprint
Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness o...
Preprint
Full-text available
Meetup.com is a global online platform which facilitates the organisation of meetups in different parts of the world. A meetup group typically focuses on one specific topic of interest, such as sports, music, language, or technology. However, many users of this platform attend multiple meetups. On this basis, we can construct a co-membership networ...
Conference Paper
Full-text available
Word embeddings represent a powerful tool for mining the vocabularies of literary and historical text. However, there is little research demonstrating appropriate strategies for representing text and setting parameters, when constructing embedding models within a digital humanities context. In this paper we examine the effects of these choices usin...
Article
Populism, or at the very least a ‘populist zeitgeist’ has advanced across the globe with populist actors from across the ideological spectrum at the forefront of politics in Europe, North and South America and Southeast Asia. One of the major components is the media and specifically hybrid media, which can inhibit or magnify populist political tend...
Article
Full-text available
Since 2013 researchers at University College Dublin in the Insight Centre for Data Analytics have been involved in a significant research programme in digital journalism, specifically targeting tools and social media guidelines to support the work of journalists. Most of this programme was undertaken in collaboration with The Irish Times. This coll...
Conference Paper
Full-text available
Within the last decade, substantial advances have been made in the field of computational linguistics, due in part to the evolution of word embedding algorithms inspired by neural network models. These algorithms attempt to derive a set of vectors which represent the vocabulary of a textual corpus in a new embedded space. This new representation ca...
Conference Paper
Full-text available
Approximately 10% of all haematologic cancers are related to Multiple Myeloma (MM). Whole-body 18F-FDG PETCT is an extremely useful imaging tool for the assessment of patients with MM. The software developed in this research performs a pixel thresholding based segmentation and a semi-automatic placement of regions of interest at key anatomical site...
Conference Paper
Semi-supervised algorithms have been shown to improve the results of topic modeling when applied to unstructured text corpora. However, sufficient supervision is not always available. This paper proposes a new process, Weak+, suitable for use in semi-supervised topic modeling via matrix factorization, when limited supervision is available. This pro...
Article
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initializat...
Article
This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP spe...
Conference Paper
Full-text available
Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of...
Data
Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of...
Conference Paper
We present TwitterCracy, an exploratory search system that allows users to search and monitor across the Twitter streams of political entities. Its exploratory capabilities stem from the application of lightweight time-series based clustering together with biased PageRank to extract facets from tweets and presenting them in a manner that facilitate...
Article
Full-text available
Introduction Multiple myeloma (MM) is a malignant hematologic disorder characterized by bone marrow infiltration with neoplastic plasma cells. Approximately 10% of all hematologic cancers are related to MM. Whole-body 18F-FDG PETCT is an extremely useful imaging tool for the assessment of patients with MM. The novel approach developed in this resea...
Conference Paper
Full-text available
Virtual Learning Environments (VLE), such as Moodle, are purpose-built platforms in which teachers and students interact to exchange, review, and submit learning material and information. In this paper, we examine a complex VLE dataset from a large Irish university in an attempt to characterize student behavior with respect to deadlines and grades....
Conference Paper
In this paper, we explore the design and effects of applying different sliding window methodologies to capture character co-occurrences within literature in order to build social networks. In particular, we focus our analysis on several works of 19th century fiction by Jane Austen and Charles Dickens. We define three different sliding window techni...
Conference Paper
Inspired by the increasing availability of large text corpora online, digital humanities scholars are adopting computational approaches to explore questions in the field of literature from new perspectives. In this paper, we examine detailed social networks of characters, extracted from several works of 19th century fiction by Jane Austen and Charl...
Conference Paper
News media face many serious concerns as their distribution channels are gradually being taken over by third parties (e.g., people sharing news on Twitter and Facebook, and GoogleNews acting as a news aggregator). If traditional media is to survive at all, it needs to develop innovative strategies around these channels, to maximize audience engagem...
Conference Paper
Full-text available
In this paper we address the problem of identifying attention dominating moments in on-line media. We are interested in discovering moments when everyone seems to be talking about the same thing. We investigate one particular aspect of breaking news: the tendency of multiple sources to concentrate attention on a single topic, leading to a collapse...
Article
Full-text available
In this paper we conduct an analysis of Moodle activity data focused on identifying early predictors of good student performance. The analysis shows that three relevant hypotheses are largely supported by the data. These hypotheses are: early submission is a good sign, a high level of activity is predictive of good results and evening activity is e...
Conference Paper
Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member withdrawal. In this paper we borrow ideas from topic analysis to study editor activity on Wikipedia over time using latent space analysis, which offers an insight in...