Danushka Bollegala

Danushka Bollegala
University of Liverpool | UoL · School of Electrical Engineering, Electronics and Computer Science

PhD (Information Engineering)

About

203
Publications
29,914
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,215
Citations
Citations since 2016
106 Research Items
1910 Citations
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
Introduction
I have broad research interests in Artificial Intelligence (AI) spanning across multiple fields such as Natural Language Processing (NLP), Web/Data Mining (DM), Machine Learning (ML), and Evolutionary Computation (EC). I have worked on numerous research problems in those fields and have published my work in top international such as ACL, NAACL, EMNLP, COLING, WWW, WSDM, IJCAI, AAAI, ECAI and journal such as IEEE TKDE, ACM TALIP, Elsevier IPM in those respective fields.
Additional affiliations
April 2012 - present
The University of Tokyo
Position
  • Professor (Associate)
April 2010 - March 2012
The University of Tokyo
Position
  • Professor (Assistant)
April 2007 - March 2010
Japan Society for the Promotion of Science

Publications

Publications (203)
Preprint
We show that the $\ell_2$ norm of a static sense embedding encodes information related to the frequency of that sense in the training corpus used to learn the sense embeddings. This finding can be seen as an extension of a previously known relationship for word embeddings to sense embeddings. Our experimental results show that, in spite of its simp...
Preprint
Full-text available
We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for Masked Language Models (MLMs), and find that there exists only a weak correlation between these two types of evaluation measures. Moreover, we find that MLMs debiased using different methods still re-learn social biases during f...
Preprint
Dynamic contextualised word embeddings represent the temporal semantic variations of words. We propose a method for learning dynamic contextualised word embeddings by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpora taken respectively at two distinct timestamps $...
Article
Systematic literature review (SLR) is a crucial method for clinicians and policymakers to make their decisions in a flood of new clinical studies. Because manual literature screening in SLR is a highly laborious task, its automation by natural language processing (NLP) has been welcomed. Although intervention is a key information for literature scr...
Conference Paper
Meta-embedding (ME) learning is an emerging approach that attempts to learn more accurate word embeddings given existing (source) word embeddings as the sole input. Due to their ability to incorporate semantics from multiple source embeddings in a compact manner with superior performance, ME learning has gained popularity among practitioners in NLP...
Article
Masked Language Models (MLMs) have shown superior performances in numerous downstream Natural Language Processing (NLP) tasks. Unfortunately, MLMs also demonstrate significantly worrying levels of social biases. We show that the previously proposed evaluation metrics for quantifying the social biases in MLMs are problematic due to the following rea...
Preprint
Full-text available
Combining multiple source embeddings to create meta-embeddings is considered effective to obtain more accurate embeddings. Different methods have been proposed to develop meta-embeddings from a given set of source embeddings. However, the source embeddings can contain unfair gender bias, and the bias in the combination of multiple embeddings and de...
Preprint
Full-text available
Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in Engli...
Preprint
Full-text available
Prior work on integrating text corpora with knowledge graphs (KGs) to improve Knowledge Graph Embedding (KGE) have obtained good performance for entities that co-occur in sentences in text corpora. Such sentences (textual mentions of entity-pairs) are represented as Lexicalised Dependency Paths (LDPs) between two entities. However, it is not possib...
Preprint
Meta-embedding (ME) learning is an emerging approach that attempts to learn more accurate word embeddings given existing (source) word embeddings as the sole input. Due to their ability to incorporate semantics from multiple source embeddings in a compact manner with superior performance, ME learning has gained popularity among practitioners in NLP...
Preprint
Full-text available
A variety of contextualised language models have been proposed in the NLP community, which are trained on diverse corpora to produce numerous Neural Language Models (NLMs). However, different NLMs have reported different levels of performances in downstream NLP applications when used as text representations. We propose a sentence-level meta-embeddi...
Preprint
Full-text available
Probing Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomena has been effective especially when these LMs are fine-tuned towards not just data of a specific domain, but also to the style or linguistic pattern of the prompts themselves. We ob...
Preprint
Sense embedding learning methods learn different embeddings for the different senses of an ambiguous word. One sense of an ambiguous word might be socially biased while its other senses remain unbiased. In comparison to the numerous prior work evaluating the social biases in pretrained word embeddings, the biases in sense embeddings have been relat...
Article
Full-text available
Word embedding models have recently shown some capability to encode hierarchical information that exists in textual data. However, such models do not explicitly encode the hierarchical structure that exists among words. In this work, we propose a method to learn hierarchical word embeddings (HWEs) in a specific order to encode the hierarchical info...
Article
The recent popularization of social web services has made them one of the primary uses of the World Wide Web. An important concept in social web services is social actions such as making connections and communicating with others and adding annotations to web resources. Predicting social actions would improve many fundamental web applications, such...
Article
Full-text available
This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence an...
Chapter
Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature. Considerable recent attention has been given to the use of various document ranking algorithms to support the maintenance of CDDs. The typical approach is to represent the update document collection using a form of word em...
Article
More and more users have been taking various actions to diverse resources referred to by URLs such as news, web pages, images, products, movies as a result of the growth of social media. They are annotating, tweeting in Twitter, reblogging in Tumblr, and Liking in Facebook, etc. Analyses about these diverse actions will be useful for aggregating or...
Preprint
Full-text available
This paper presents a novel unsupervised abstractive summarization method for opinionated texts. While the basic variational autoencoder-based models assume a unimodal Gaussian prior for the latent code of sentences, we alternate it with a recursive Gaussian mixture, where each mixture component corresponds to the latent code of a topic sentence an...
Preprint
Cross-lingual text representations have gained popularity lately and act as the backbone of many tasks such as unsupervised machine translation and cross-lingual information retrieval, to name a few. However, evaluation of such representations is difficult in the domains beyond standard benchmarks due to the necessity of obtaining domain-specific p...
Preprint
Full-text available
Masked Language Models (MLMs) have shown superior performances in numerous downstream NLP tasks when used as text encoders. Unfortunately, MLMs also demonstrate significantly worrying levels of social biases. We show that the previously proposed evaluation metrics for quantifying the social biases in MLMs are problematic due to following reasons: (...
Preprint
Full-text available
A health outcome is a measurement or an observation used to capture and assess the effect of a treatment. Automatic detection of health outcomes from text would undoubtedly speed up access to evidence necessary in healthcare decision making. Prior work on outcome detection has modelled this task as either (a) a sequence labelling task, where the go...
Preprint
Full-text available
Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique a...
Preprint
Full-text available
Negative sampling is a limiting factor w.r.t. the generalization of metric-learned neural networks. We show that uniform negative sampling provides little information about the class boundaries and thus propose three novel techniques for efficient negative sampling: drawing negative samples from (1) the top-$k$ most semantically similar classes, (2...
Preprint
Full-text available
Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the...
Preprint
Full-text available
In comparison to the numerous debiasing methods proposed for the static non-contextualised word embeddings, the discriminative biases in contextualised embeddings have received relatively little attention. We propose a fine-tuning method that can be applied at token- or sentence-levels to debias pre-trained contextualised embeddings. Our proposed m...
Preprint
Full-text available
Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, witho...
Preprint
Full-text available
Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the...
Chapter
Curated Document Databases play a critical role in helping researchers find relevant articles in available literature. One such database is the ORRCA (Online Resource for Recruitment research in Clinical trials) database. The ORRCA database brings together published work in the field of clinical trials recruitment research into a single searchable...
Article
Explanation has been a central feature of AI systems for legal reasoning since their inception. Recently, the topic of explanation of decisions has taken on a new urgency, throughout AI in general, with the increasing deployment of AI tools and the need for lay users to be able to place trust in the decisions that the support tools are recommending...
Preprint
Prior work investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to appl...
Chapter
A semantic relation between two given words a and b can be represented using two complementary sources of information: (a) the semantic representations of a and b (expressed as word embeddings) and, (b) the contextual information obtained from the co-occurrence contexts of the two words (expressed in the form of lexico-syntactic patterns). Pattern-...
Chapter
This paper examines the effect of using co-reference chains based conversational history against the use of entire conversation history for conversational question answering (CoQA) task. The QANet model is modified to include conversational history and NeuralCoref is used to obtain co-reference chains based conversation history. The results of the...
Chapter
Knowledge Graph Embedding methods learn low-dimensional representations for entities and relations in knowledge graphs, which can be used to infer previously unknown relations between pairs of entities in the knowledge graph. This is particularly useful for expanding otherwise sparse knowledge graphs. However, the relation types that can be predict...
Chapter
Prior work has investigated the problem mining arguments from online reviews by classifying opinions based on the stance expressed explicitly or implicitly. An implicit opinion has the stance left unexpressed linguistically while an explicit opinion has the stance expressed explicitly. In this paper, we propose a bipartite graph-based approach to r...
Chapter
Evaluating the performance of neural network-based text generators and density estimators is challenging since no one measure perfectly evaluates language quality. Perplexity has been a mainstay metric for neural language models trained by maximizing the conditional log-likelihood. We argue perplexity alone is a naive measure since it does not expl...
Preprint
Full-text available
The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground...
Preprint
Full-text available
Dialogue engines that incorporate different types of agents to converse with humans are popular. However, conversations are dynamic in the sense that a selected response will change the conversation on-the-fly, influencing the subsequent utterances in the conversation, which makes the response selection a challenging problem. We model the problem o...
Preprint
This paper presents a contextualized graph attention network that combines edge features and multiple sub-graphs for improving relation extraction. A novel method is proposed to use multiple sub-graphs to learn rich node representations in graph-based networks. To this end multiple sub-graphs are obtained from a single dependency tree. Two types of...
Preprint
Domain adaptation considers the problem of generalising a model learnt using data from a particular source domain to a different target domain. Often it is difficult to find a suitable single source to adapt from, and one must consider multiple sources. Using an unrelated source can result in sub-optimal performance, known as the \emph{negative tra...
Preprint
Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, languag...
Preprint
We propose a novel non-parametric method for cross-modal retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing...
Conference Paper
Full-text available
Prior work has investigated the problem mining arguments from online reviews by classifying opinions based on the stance expressed explicitly or implicitly. An implicit opinion has the stance left unexpressed linguistically while an explicit opinion has the stance expressed explicitly. In this paper, we propose a bipartite graph-based approach to r...
Preprint
Protecting the privacy of search engine users is an important requirement in many information retrieval scenarios. A user might not want a search engine to guess his or her information need despite requesting relevant results. We propose a method to protect the privacy of search engine users by decomposing the queries using semantically \emph{relat...
Preprint
Full-text available
Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradient methods are used since the gradient can be computed without requiring a differentiable objective....
Conference Paper
Coherent division of legal document bundles, whether this is done in the context of court bundles, briefs or some other application, is a time consuming and challenging task. We propose an approach whereby this process can be automated. Two variations are considered. The first addresses the scenario where the topic labelling is pre-defined and adop...
Preprint
Word embeddings learnt from massive text collections have demonstrated significant levels of discriminative biases such as gender, racial or ethnic biases, which in turn bias the down-stream NLP applications that use those word embeddings. Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gende...
Preprint
Full-text available
In this paper we propose a novel neural language modelling (NLM) method based on \textit{error-correcting output codes} (ECOC), abbreviated as ECOC-NLM. This latent variable based approach provides a principled way to choose a varying amount of latent output codes and avoids exact softmax normalization. Instead of minimizing measures between the pr...
Chapter
Full-text available
Continuous authentication is significant with respect to many online applications where it is desirable to monitor a user’s identity throughout an entire session; not just at the beginning of the session. One example application domain, where this is a requirement, is in relation to Massive Open Online Courses (MOOCs) when users wish to obtain some...
Preprint
Full-text available
We propose in this paper a combined model of Long Short Term Memory and Convolutional Neural Networks (LSTM-CNN) that exploits word embeddings and positional embeddings for cross-sentence n-ary relation extraction. The proposed model brings together the properties of both LSTMs and CNNs, to simultaneously exploit long-range sequential information a...
Preprint
Full-text available
This paper carries out an empirical analysis of various dropout techniques for language modelling, such as Bernoulli dropout, Gaussian dropout, Curriculum Dropout, Variational Dropout and Concrete Dropout. Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules. We find these extens...
Chapter
A classifier-based pattern selection approach for relation instance extraction is proposed in this paper. The classifier-based pattern selection approach proposes to employ a binary classifier that filters patterns that extracts incorrect entities for a given relation, from pattern set obtained using global estimates such as high frequency. The pro...
Preprint
Full-text available
Word embeddings have been shown to benefit from ensembling several word embedding sources, often carried out using straightforward mathematical operations over the set of vectors to produce a meta-embedding representation. More recently, unsupervised learning has been used to find a lower-dimensional representation, similar in size to that of the w...
Preprint
Full-text available
The task of multi-step ahead prediction in language models is challenging considering the discrepancy between training and testing. At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training. This difference, known as exposure bias, can lead to the co...
Preprint
Full-text available
Ensembling word embeddings to improve distributed word representations has shown good success for natural language processing tasks in recent years. These approaches either carry out straightforward mathematical operations over a set of vectors or use unsupervised learning to find a lower-dimensional representation. This work compares meta-embeddin...
Conference Paper
Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine t...
Research
Full-text available
Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a compara...
Conference Paper
Full-text available
Online reviews have become a popular portal among customers making decisions about purchasing products. A number of corpora of reviews have been widely investigated in NLP in general, and, in particular, in argument mining. This is a subset of NLP that deals with extracting arguments and the relations among them from user-based content. A major pro...
Article
Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a compara...
Article
Full-text available
Background: Detecting adverse drug reactions (ADRs) is an important task that has direct implications for the use of that drug. If we can detect previously unknown ADRs as quickly as possible, then this information can be provided to the regulators, pharmaceutical companies, and health care organizations, thereby potentially reducing drug-related...
Conference Paper
Full-text available
Methods for learning lower-dimensional representations (embeddings) of words using unlabelled data have received a renewed interested due to their myriad success in various Natural Language Processing (NLP) tasks. However, despite their success, a common deficiency associated with most word embedding learning methods is that they learn a single rep...
Article
Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is...
Article
Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in...
Article
Full-text available
Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These be...
Article
Full-text available
Keyboard typing patterns are a form of behavioural biometric that can be usefully employed for the purpose of user authentication. The technique has been extensively investigated with respect to the typing of fixed texts such as passwords and pin numbers, so-called static authentication. The typical approach is to compare a current “typing sample”...