Nisansa de Silva

Nisansa de Silva
University of Moratuwa | UoM · Department of Computer Science and Engineering

PhD

About

106
Publications
508,647
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
907
Citations

Publications

Publications (106)
Conference Paper
Full-text available
Searching for a cure for cancer is one of the most vital pursuits in modern medicine. In that aspect microRNA research plays a key role. Keeping track of the shifts and changes in established knowledge in the microRNA domain is very important. In this paper, we introduce an Ontology-Based Information Extraction method to detect occurrences of incon...
Article
Full-text available
Sinhala, despite its several millennia long history, remains a resource poor language. The objective of this study was to explore the possibility of enhancing the text classification process of a resource poor language by means of data and tools from a resource rich language. However, it was discovered that if the feature space is based on an n-gra...
Preprint
Full-text available
Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor lan...
Preprint
Full-text available
Aspect-based Sentiment Analysis (ABSA) is a critical task in Natural Language Processing (NLP) that focuses on extracting sentiments related to specific aspects within a text, offering deep insights into customer opinions. Traditional sentiment analysis methods, while useful for determining overall sentiment, often miss the implicit opinions about...
Preprint
Full-text available
In the rapidly evolving digital era, there is an increasing demand for concise information as individuals seek to distil key insights from various sources. Recent attention from researchers on Multi-document Summarisation (MDS) has resulted in diverse datasets covering customer reviews, academic papers, medical and legal documents, and news article...
Preprint
Full-text available
Since the dawn of the digitalisation era, customer feedback and online reviews are unequivocally major sources of insights for businesses. Consequently, conducting comparative analyses of such sources has become the de facto modus operandi of any business that wishes to give itself a competitive edge over its peers and improve customer loyalty. Sen...
Preprint
Full-text available
Manual data annotation is an important NLP task but one that takes considerable amount of resources and effort. In spite of the costs, labeling and categorizing entities is essential for NLP tasks such as semantic evaluation. Even though annotation can be done by non-experts in most cases, due to the fact that this requires human labor, the process...
Preprint
Full-text available
We analysed a sample of NLP research papers archived in ACL Anthology as an attempt to quantify the degree of openness and the benefit of such an open culture in the NLP community. We observe that papers published in different NLP venues show different patterns related to artefact reuse. We also note that more than 30% of the papers we analysed do...
Preprint
Full-text available
This paper is aimed at evaluating state-of-the-art models for Multi-document Summarization (MDS) on different types of datasets in various domains and investigating the limitations of existing models to determine future research directions. To address this gap, we conducted an extensive literature review to identify state-of-the-art models and data...
Conference Paper
Full-text available
Manual data annotation is an important NLP task but one that takes a considerable amount of resources and effort. In spite of the costs, labelling and categorizing entities are essential for NLP tasks such as semantic evaluation. Even though annotation can be done by non-experts in most cases, due to the fact that this requires human labour, the pr...
Preprint
Full-text available
Parallel datasets are vital for performing and evaluating any kind of multilingual task. However, in the cases where one of the considered language pairs is a low-resource language, the existing top-down parallel data such as corpora are lacking in both tally and quality due to the dearth of human annotation. Therefore, for low-resource languages,...
Preprint
Full-text available
This paper introduces the Forgotten Realms Wiki (FRW) data set and domain specific natural language generation using FRW along with related analyses. Forgotten Realms is the de-facto default setting of the popular open ended tabletop fantasy role playing game, Dungeons & Dragons. The data set was extracted from the Forgotten Realms Fandom wiki cons...
Conference Paper
Full-text available
With the popularity of smartphones, mobile application (A.K.A Mobile App) development has become a booming industry all across the world. One of the main hurdles that app developers are facing, is understanding users’ needs and catering their products to satisfy the users. Though Users are one of the main stakeholders of the App development process...
Conference Paper
Full-text available
Sifting through hundreds of old case documents to obtain information pertinent to the case in hand has been a major part of the legal profession for centuries. However, with the expansion of court systems and the compounding nature of case law, this task has become more and more intractable with time and resource constraints. Thus automation by Nat...
Article
Full-text available
The rapid growth of text corpora across various domains has emerged a need and an opportunity to leverage Natural Language Processing to automate and efficiently streamline tedious manual tasks. Legal domain is one such text rich domain which suffers a rapid growth of text corpora and requirement for natural language processing applications. In the...
Article
Full-text available
Research on natural language processing in most regional languages is hindered due to resource poverty. A possible solution for this is utilization of social media data in research. For example, the Facebook network allows its users to record their reactions to text via a typology of emotions. This network, taken at scale, is therefore a prime data...
Preprint
Full-text available
In the process of numerically modeling natural languages, developing language embeddings is a vital step. However, it is challenging to develop functional embeddings for resource-poor languages such as Sinhala, for which sufficiently large corpora, effective language parsers, and any other required resources are difficult to find. In such condition...
Conference Paper
Full-text available
As Natural Language Processing is evolving rapidly, it is used to analyze domain specific large text corpora. Applying Natural Language Processing in a domain with uncommon vocabulary and unique semantics requires techniques specifically designed for that domain. The legal domain is such an area with unique vocabulary and semantic interpretations....
Conference Paper
Full-text available
This paper introduces the Forgotten Realms Wiki (FRW) data set and domain specific natural language generation using FRW along with related analyses. Forgotten Realms is the de-facto default setting of the popular open ended tabletop fantasy role playing game, Dungeons & Dragons. The data set was extracted from the Forgotten Realms Fandom wiki cons...
Preprint
Full-text available
Linguistic disparity in the NLP world is a problem that has been widely acknowledged recently. However, different facets of this problem, or the reasons behind this disparity are seldom discussed within the NLP community. This paper provides a comprehensive analysis of the disparity that exists within the languages of the world. We show that simply...
Article
Full-text available
Since the advent of deep learning based Natural Language Processing (NLP), diverse domains of human society have benefited form automation and the resultant increment in efficiency. Law and order are, undoubtedly, crucial for the proper functioning of society; for without law there would be chaos, failing to offer equality to everyone. The legal do...
Article
Full-text available
Existing literature demonstrates that verbs are pivotal in legal information extraction tasks due to their semantic and argumentative properties. However, granting computers the ability to interpret the meaning of a verb and its semantic properties in relation to a given context can be considered as a challenging task, mainly due to the polysemic a...
Preprint
Full-text available
Wordle, a word guessing game rose to global popularity in the January of 2022. The goal of the game is to guess a five-letter English word within six tries. Each try provides the player with hints by means of colour changing tiles which inform whether or not a given character is part of the solution as well as, in cases where it is part of the solu...
Article
Full-text available
When lawyers and legal officers are working on a new legal case, they are supposed have properly studied prior cases similar to the current case, as the prior cases can provide valuable information which can have a direct impact on the outcomes of the current court case. Therefore, developing methodologies which are capable of automatically extract...
Article
Full-text available
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatri...
Preprint
Full-text available
The relationship between Facebook posts and the corresponding reaction feature is an interesting subject to explore and understand. To archive this end, we test state-of-the-art Sinhala sentiment analysis models against a data set containing a decade worth of Sinhala posts with millions of reactions. For the purpose of establishing benchmarks and w...
Preprint
Full-text available
The Facebook network allows its users to record their reactions to text via a typology of emotions. This network, taken at scale, is therefore a prime data set of annotated sentiment data. This paper uses millions of such reactions, derived from a decade worth of Facebook post data centred around a Sri Lankan context, to model an eye of the beholde...
Preprint
Full-text available
The advancement of Natural Language Processing (NLP) is spreading through various domains in forms of practical applications and academic interests. Inherently, the legal domain contains a vast amount of data in text format. Therefore it requires the application of NLP to cater to the analytically demanding needs of the domain. Identifying importan...
Chapter
Full-text available
Legal information retrieval holds a significant importance to lawyers and legal professionals. Its significance has grown as a result of the vast and rapidly increasing amount of legal documents available via electronic means. Legal documents, which can be considered flat file databases, contain information that can be used in a variety of ways, in...
Preprint
Full-text available
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actu...
Conference Paper
Full-text available
In the field of natural language processing, domain-specific information retrieval using given documents has been a prominent and ongoing research area. Automatic extraction of the legal parties (petitioner and defendant sets) involved in a legal case has a significant impact on the proceedings of legal cases. This is a study proposing a novel way...
Preprint
Full-text available
Aspect-Based Sentiment Analysis (ABSA) has been prominent and ongoing research over many different domains, but it is not widely discussed in the legal domain. A number of publicly available datasets for a wide range of domains usually fulfill the needs of researchers to perform their studies in the field of ABSA. To the best of our knowledge, ther...
Preprint
Full-text available
A document which elaborates opinions and arguments related to the previous court cases is known as a legal opinion text. Lawyers and legal officials have to spend considerable effort and time to obtain the required information manually from those documents when dealing with new legal cases. Hence, it provides much convenience to those individuals i...
Conference Paper
Full-text available
In the field of natural language processing, domain specific information retrieval using given documents has been a prominent and ongoing research area. The automatic extraction of the legal parties involved in a legal case has a significant impact on the proceedings of legal cases. This is a study proposing a novel way to extract the legal parties...
Preprint
Full-text available
Analyzing the sentiments of legal opinions available in Legal Opinion Texts can facilitate several use cases such as legal judgement prediction, contradictory statements identification and party-based sentiment analysis. However, the task of developing a legal domain specific sentiment annotator is challenging due to resource constraints such as la...
Article
Full-text available
Question Answering (QA) requires understanding of queries expressed in natural languages and identification of relevant information content to provide an answer. For closed-world QAs, information access is obtained by means of either context texts, or a Knowledge Base (KB), or both. KBs are human-generated schematic representations of world knowled...
Preprint
Full-text available
This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, i...
Article
Full-text available
This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Policy team of LIRNEasia, as well as a list of algorithmically derived stopwords. The larger of the two corpora spans 2010 to 2020 and contains 28,825,820 to 29,549,672 words of multilingual text posted by 533 Sri Lankan Facebook pages, i...
Article
Full-text available
Information that are available in court case transcripts which describes the proceedings of previous legal cases are of significant importance to legal officials. Therefore, automatic information extraction from court case transcripts can be considered as a task of huge importance when it comes to facilitating the processes related to legal domain....
Conference Paper
Full-text available
Unplanned intensive care units (ICU) readmissions and in-hospital mortality of patients are two important metrics for evaluating the quality of hospital care. Identifying patients with higher risk of readmission to ICU or of mortality can not only protect those patients from potential dangers, but also reduce the high costs of healthcare. In this w...
Chapter
Full-text available
Semantic oppositeness is the natural counterpart of the much popular natural language processing concept, semantic similarity. Much like how semantic similarity is a measure of the degree to which two concepts are similar, semantic oppositeness yields the degree to which two concepts would oppose each other. This complementary nature has resulted i...
Preprint
Full-text available
Arguments, counter-arguments, facts, and evidence obtained via documents related to previous court cases are of essential need for legal professionals. Therefore, the process of automatic information extraction from documents containing legal opinions related to court cases can be considered to be of significant importance. This study is focused on...
Preprint
Full-text available
Natural Language Processing (NLP) is a broad umbrella of technologies used for computationally studying large amounts of text and extracting meaning - both syntactic and semantic information. Software using NLP technologies, if engineered for that purpose, generally have the advantage of being able to process large amounts of text at rates greater...
Preprint
Full-text available
Natural Language Processing (NLP) is a broad umbrella of technologies used for computationally studying large amounts of text and extracting meaning - both syntactic and semantic information. Software using NLP technologies, if engineered for that purpose, generally have the advantage of being able to process large amounts of text at rates greater...
Preprint
Full-text available
Large scale knowledge graph embedding has attracted much attention from both academia and industry in the field of Artificial Intelligence. However, most existing methods concentrate solely on fact triples contained in the given knowledge graph. Inspired by the fact that logic rules can provide a flexible and declarative language for expressing ric...
Chapter
Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy co...
Preprint
Full-text available
This study proposes a novel way of identifying the sentiment of the phrases used in the legal domain. The added complexity of the language used in law, and the inability of the existing systems to accurately predict the sentiments of words in law are the main motivations behind this study. This is a transfer learning approach, which can be used for...
Preprint
Full-text available
Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to t...
Conference Paper
Full-text available
Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to t...
Article
Full-text available
An ontology defines a set of representational primitives which model a domain of knowledge or discourse. With the arising fields such as information extraction and knowledge management, the role of ontology has become a driving factor of many modern day systems. Ontology population, on the other hand, is a inherently problematic process, as it need...
Preprint
Full-text available
Domain specific information retrieval process has been a prominent and ongoing research in the field of natural language processing. Many researchers have incorporated different techniques to overcome the technical and domain specificity and provide a mature model for various domains of interest. The main bottleneck in these studies is the heavy co...
Article
Question retrieval, which aims to find similar versions of a given question, is playing a pivotal role in various question answering (QA) systems. This task is quite challenging, mainly in regard to five aspects: synonymy, polysemy, word order, question length, and data sparsity. In this article, we propose a unified framework to simultaneously han...
Conference Paper
Full-text available
This study proposes a novel way of identifying the sentiment of the phrases used in the legal domain. The added complexity of the language used in law, and the inability of the existing systems to accurately predict the sentiments of words in law are the main motivations behind this study. This is a transfer learning approach, which can be used for...
Conference Paper
Semantic similarity measures are an important part in Natural Language Processing tasks. However Semantic similarity measures built for general use do not perform well within specific domains. Therefore in this study we introduce a domain specific semantic similarity measure that was created by the synergistic union of word2vec, a word embedding me...
Presentation
Full-text available
Sensing Puns is its Own Reword: Automatic Detection of Paronomasia
Preprint
Full-text available
In many modern day systems such as information extraction and knowledge management agents, ontologies play a vital role in maintaining the concept hierarchies of the selected domain. However, ontology population has become a problematic process due to its nature of heavy coupling with manual human intervention. With the use of word embeddings in th...