
Nandana MihindukulasooriyaIBM Research · MIT-IBM Watson AI Lab
Nandana Mihindukulasooriya
PhD in Artificial intelligence
About
78
Publications
10,861
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
510
Citations
Citations since 2017
Introduction
Additional affiliations
July 2012 - April 2015
Publications
Publications (78)
The recent advances in large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs can be used for KG construction or completion while existing KGs can be used for different tasks such as making LLM...
Pre-trained transformer-based language models are becoming increasingly popular due to their exceptional performance on various benchmarks. However, concerns persist regarding the presence of hidden biases within these models, which can lead to discriminatory outcomes and reinforce harmful stereotypes. To address this issue, we propose Finspector,...
Knowledge graphs can represent information about the real-world using entities and their relations in a structured and semantically rich manner and they enable a variety of downstream applications such as question-answering, recommendation systems, semantic search, and advanced analytics. However, at the moment, building a knowledge graph involves...
Digital Twins are digital representations of systems in the Internet of Things (IoT) that are often based on AI models that are trained on data from those systems. Semantic models are used increasingly to link these datasets from different stages of the IoT systems life-cycle together and to automatically configure the AI modelling pipelines. This...
We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions compliant with the TBox of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART. Given a sentence, we...
Digital Twins are digital representations of systems in the Internet of Things (IoT) that are often based on AI models that are trained on data from those systems. Semantic models are used increasingly to link these datasets from different stages of the IoT systems life-cycle together and to automatically configure the AI modelling pipelines. This...
A research division plays an important role of driving innovation in an organization. Drawing insights, following trends, keeping abreast of new research, and formulating strategies are increasingly becoming more challenging for both researchers and executives as the amount of information grows in both velocity and volume. In this paper we present...
A research division plays an important role of driving innovation in an organization. Drawing insights, following trends, keeping abreast of new research, and formulating strategies are increasingly becoming more challenging for both researchers and executives as the amount of information grows in both velocity and volume. In this paper we present...
This presentation covers the state-of-the-art benchmark dataset resources as well as the best neural models for the respective task perspectives.
Knowledge bases (KBs) are often incomplete and constantly changing in practice. Yet, in many question answering applications coupled with knowledge bases, the sparse nature of KBs is often overlooked. To this end, we propose a case-based reasoning approach, CBR-iKB, for knowledge base question answering (KBQA) with incomplete-KB as our main focus....
In a recent work, we presented a novel state-of-the-art approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. In this paper, we propose a system based on an enhanced version of this approach where we train task specific models for other k...
Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a...
In recent years, a number of keyphrase generation (KPG) approaches were proposed consisting of complex model architectures, dedicated training paradigms and decoding strategies. In this work, we opt for simplicity and show how a commonly used seq2seq language model, BART, can be easily adapted to generate keyphrases from the text in a single batch...
We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA). This allows to delegate part of the abstraction problem to a strongly pre-trained semantic parser, while learning transpiling with small amount of paired data. We departure from recent work relating AMR a...
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge af...
Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel appr...
Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul...
Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel appr...
We present KaaPa (Knowledge Aware Answers from Pdf Analysis), an integrated solution for machine reading comprehension over both text and tables extracted from PDFs. KaaPa enables interactive question refinement using facets generated from an automatically induced Knowledge Graph. In addition it provides a concise summary of the supporting evidence...
Inferring semantic types for entity mentions within text documents is an important asset for many downstream NLP tasks, such as Semantic Role Labelling, Entity Disambiguation, Knowledge Base Question Answering, etc. Prior works have mostly focused on supervised solutions that generally operate on relatively small-to-medium-sized type systems. In th...
Noun phrases and relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to face this problem take a two-step approach: first, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is use...
Knowledge base question answering (KBQA) is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large training datasets. In this work, we propose a semantic parsing and reasoning-based Neuro-Symbolic Question Answering(NSQA)...
Each year the International Semantic Web Conference accepts a set of Semantic Web Challenges to establish competitions that will advance the state of the art solutions in any given problem domain. The SeMantic AnsweR Type prediction task (SMART) was part of ISWC 2020 challenges. Question type and answer type prediction can play a key role in knowle...
IT support is a vital and integral part of technology adoption. Conventionally, IT support service providers heavily rely on human effort and expertise to respond to user queries. Given the cost-benefit and 24 \(\times \) 7 availability for answering user questions, Virtual Assistants (VA) are highly applicable in the technical support domain. In t...
Knowledge base question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledge bases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking f...
Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking fra...
This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabular...
This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabular...
The use of information and communication technologies facilitates energy management (EM) at both district and building levels, but also generates a considerable amount of data. To gain insights into such data, it is essential to resolve the cross-domain data interoperability problem and determine an approach to exchange performance information and...
Assessing the quality of an evolving knowledge base is a challenging task as it often requires to identify correct quality assessment procedures. Since data is often derived from autonomous, and increasingly large data sources, it is impractical to manually curate the data, and challenging to continuously and automatically assess their quality. In...
Knowledge Base Population (KBP) is an important problem in Semantic Web research and a key requirement for successful adoption of semantic technologies in many applications. In this paper we present Socrates, a deep learning based solution for Automated Knowledge Base Population from Text. Socrates does not require manual annotations which would ma...
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence. Assessing the quality of a knowledge base is a complex task as it often means measuring the quality of structured information, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2...
Spain is a hot-spot for the European tourism that conforms an important part of its economy. Large cities tend to monopolize this sector unbalancing the outcome of the tourism in Spain. Promoting festivals from less-known regions that belong to the Spanish cultural heritage has been proposed as a solution to balance the economy of this sector. Unfo...
Stakeholders – curator, consumer, etc. – in the tourism domain routinely need to combine and compare statistical indicators about tourism. In this context, various Knowledge Bases (KBs) have been designed and developed in the Linked Open Data (LOD) cloud in order to support decision-making process in Tourism domain. Such KBs evolve over time: their...
Type information, which is useful for responding many queries, plays a key role in Semantic Web. Nevertheless, it is common that type information of some instances is not present in knowledge graphs. Thus, type prediction of a given instance using background knowledge is an important knowledge graph completion task. To this end, the objective of th...
Tourism is a crucial component of Sri Lanka’s economy. Intelligent business decisions by means of thorough analysis of relevant data can help the Sri Lankan tourism industry to be competitive. To this end, Sri Lanka Tourism Development Authority makes tourism statistics publicly available. However, they are published as PDF files limiting their reu...
As the social media has redefined how people communicate and contact their networks, the importance once held by interpersonal influence and word-of-mouth has shifted toward electronic word-of-mouth (eWOM). The influence of eWOM is prominent in the tourism industry. In such a context, this study aims to examine the viability of using tourists as am...
We present DBtravel, a tourism-oriented knowledge graph generated from the collaborative travel site Wikitravel. Our approach takes advantage of the recommended guideline for contributors provided by Wikitravel and extracts the named entities available in Wikitravel Spanish entries by using a NLP pipeline. Compared to a manually annotated gold stan...
Stakeholders – curator, consumer, etc. – in the tourism domain routinely need to combine and compare statistical indicators about tourism. In this context, various Knowledge Bases (KBs) have been designed and developed in the Linked Open Data (LOD) cloud in order to support decision-making process in Tourism domain. Such KBs evolve over time: their...
Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in...
DBpedia releases consist of more than 70 multilingual datasets that cover data extracted from different language-specific Wikipedia instances. The data extracted from those Wikipedia instances are transformed into RDF using mappings created by the DBpedia community. Nevertheless, not all the mappings are correct and consistent across all the distin...
Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in...
Knowledge Graphs (KG) are becoming core components of most artificial intelligence applications. Linked Data, as a method of publishing KGs, allows applications to traverse within, and even out of, the graph thanks to global dereferenceable identifiers denoting entities, in the form of IRIs. However, as we show in this work, after analyzing several...
The Linked (Open) Data cloud has been growing at a rapid rate in recent years. However, the large variance of quality in its datasets is a key obstacle that hinders their use, so quality assessment has become an important aspect. Data profiling is one of the widely used techniques for data quality assessment in domains such as relational data; neve...
Most of Semantic Web data is being generated from legacy datasets with the help of mappings, some of which may have been specified declaratively in languages such as R2RML or its extensions: RML and xR2RML. Most of these mappings are kept locally in each organization, and to the best to our knowledge, a shared repository that would facilitate the d...
During the last decade, the Linked Open Data cloud has grown with much enthusiasm and a lot organizations are publishing their data as Linked Data. However, it is not evident whether enough efforts have been invested in maintaining those data or ensuring their quality. Data quality, defined as “fitness for use”, is an important aspect for Linked Da...
Since more than a decade, theoretical research on ontology evolution has been published in literature and several frameworks for managing ontology changes have been proposed. However, there are less studies that analyze widely used ontologies that were developed in a collaborative manner to understand community-driven ontology evolution in practice...
With the increasing amount of Linked Data published on the Web, the community has recognised the importance of the quality of such data and a number of initiatives have been undertaken to specify and evaluate Linked Data quality. However, these initiatives are characterised by a high diversity in terms of the quality aspects that they address and m...
This position paper discusses the need for extending dataset descriptions, such as DCAT, in the case of RDF data to include comprehensive vocabulary usage and triple pattern information (for instance, as a DCAT profile for vocabulary usage and triple patterns in RDF data). As the basis of the discussion, the paper presents four use cases whose requ...
DBpedia extracts most of its data from Wikipedia’s infoboxes. Manually-created “mappings” link infobox attributes to DBpedia ontology properties (dbo properties) producing most used DBpedia triples. However, infoxbox attributes without a mapping produce triples with properties in a different namespace (dbp properties). In this position paper we poi...
DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia ontology but...
The Linked Data initiative continues to grow making more datasets available; however, discovering the type of data contained in a dataset, its structure, and the vocabularies used still remains a challenge hindering the querying and reuse. VoID descriptions provide a starting point but a more detailed analysis is required to unveil the implicit voc...
Read-write Linked Data applications provide a novel alternative to application integration that helps breaking data silos by combining the Semantic Web technologies with the REST design principles. One drawback that hinders the adoption of this approach in enterprise systems is the lack of transactions support. Transactions play a vital role in ent...
The Linked Data Platform (LDP) W3C Recommendation provides a standard protocol and a set of best practices for the development of read-write Linked Data applications based on HTTP access to Web resources that describe their state using the RDF data model. The Hydra Core Vocabulary is an initiative to define a lightweight vocabulary to describe hype...
In this demo we present LDP4ROs, a prototype implementation that allows creating, browsing and updating Research Objects (ROs) and their contents using typical HTTP operations. This is achieved by aligning the RO model with the W3C Linked Data Platform (LDP).
Enterprises are increasingly using a wide range of heterogeneous information systems for executing and governing their business activities. Even if the adoption of service orientation has improved loose coupling and reusability, applications are still isolated data silos whose integration requires complex transformations and mediations. However, by...
The W3C Linked Data Platform (LDP) candidate recommendation defines a standard HTTP-based protocol for read/write Linked Data. The W3C R2RML recommendation defines a language to map relational databases (RDBs) and RDF. This paper presents morph-LDP, a novel system that combines these two W3C standardization initiatives to expose relational data as...
The REpresentational State Transfer (REST) architectural style describes the design principles that made the World Wide Web scalable and the same principles can be applied in enterprise context to do loosely coupled and scalable application integration. In recent years, RESTful services are gaining traction in the industry and are commonly used as...
Enterprises are increasingly using a wide range of heterogeneous information systems for executing and governing their business activities. Even if the adoption of service orientation has improved loose coupling and reusability, applications are still isolated data silos requiring complex transformation and mediation for integrating them.
The W3C...
Linked Data is not always published with a license. Some-times a wrong license type is used, like a license for software, or it is not expressed in a standard, machine readable manner. Yet, Linked Data re-sources may be subject to intellectual property and database laws, may contain personal data subject to privacy restrictions or may even contain...