About
38
Publications
11,871
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
580
Citations
Introduction
Publications
Publications (38)
Industry 4.0 (I4.0) is a new era in the industrial revolution that emphasizes machine connectivity, automation, and data analytics. The I4.0 pillars such as autonomous robots, cloud computing, horizontal and vertical system integration, and the industrial internet of things have increased the performance and efficiency of production lines in the ma...
We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with th...
A key property of Linked Data is the representation and publication of data as interconnected labelled graphs where different resources linked to each other form a network of meaningful information. Searching these important relationships between resources-within single or distributed graphs-can be reduced to a pathfinding or navigation problem, i....
In this paper, we propose a domain agnostic and query driven approach to monitor, assess, and analyze quality of the linked data hosted by public SPARQL endpoints. We identified various quality related met-rics for linked datasets and used linked data vocabulary to represent quality information. We provide a Linked Data Quality (LDQ) dataset, which...
Microbenchmarks are used to test the individual components of the given systems. Thus, such benchmarks can provide a more detailed analysis pertaining to the different components of the systems. We present a demo of the QaldGen [5], a framework for generating question samples for micro benchmarking of Question Answering (QA) systems over Knowledge...
A key property of Linked Data, i.e., the Web-based representation and publication of data as interconnected labelled graphs, is that it enables querying and navigating through datasets distributed across the network. SPARQL1.1, the current standard query language for RDF-based Linked Data, defines a construct –called Property Paths (PP)– to navigat...
Triplestores are data management systems for storing and querying RDF data. Over recent years, various benchmarks have been proposed to assess the performance of triplestores across different performance measures. However, choosing the most suitable benchmark for evaluating triplestores in practical settings is not a trivial task. This is because t...
Visualization of Gene Expression (GE) is a challenging task since the number of genes and their associations are difficult to predict in various set of biological studies. GE could be used to understand tissue-gene-protein relationships. Currently, Heatmaps is the standard visualization technique to depict GE data. However, Heatmaps only covers the...
A mishap in anti-cancer drug distribution is critical in breast cancer patients due to poor prediction model to identify the treatment regime in ER+ve and ER-ve patients. The traditional method for the prediction depends on the change in expression across the normal-disease pair. However, it certainly misses the multidimensional aspect and underlyi...
Visualization of Gene Expression (GE) is a challenging task since the number of genes and their associations are difficult to predict in various set of biological studies. GE could be used to understand tissue-gene-protein relationships. Currently heat map is the standard vi-sualization technique to depict GE data. However, heat map only covers the...
In this demo paper, we present the interface of the SQCFramework [8], a SPARQL query containment benchmark generation framework. SQCFrame-work is able to generate customized SPARQL containment benchmarks from real SPARQL query logs. To this end, the framework makes use of different clustering techniques. It is flexible enough to generate benchmarks...
Traversing paths within a graph is a well-studied problem and highly intractable especially with large-scale graphs. In case of mul- tiple graphs, the standard practice is to merge distinct graphs in a cen- tralised way to evaluate the existence of paths between given entities (or nodes). In the biomedical domain counting and retrieving the number...
Access to hundreds of knowledge bases has been made available on the Web through public SPARQL endpoints. Unfortunately, few endpoints publish descriptions of their content (e.g., using VoID). It is thus unclear how agents can learn about the content of a given SPARQL endpoint or, relatedly, find SPARQL endpoints with content relevant to their need...
Access to hundreds of knowledge bases has been made available on the Web through SPARQL endpoints. Unfortunately, few endpoints publish descriptions of their content. It is thus unclear how agents can learn about the content of a given endpoint. This research investigates the feasibility of a system that gathers information about public endpoints b...
Background
Next Generation Sequencing (NGS) is playing a key role in therapeutic decision making for the cancer prognosis and treatment. The NGS technologies are producing a massive amount of sequencing datasets. Often, these datasets are published from the isolated and different sequencing facilities. Consequently, the process of sharing and aggre...
ery containment is a fundamental problem in data management with its main application being in global query optimization. A number of SPARQL query containment solvers for SPARQL have been developed recently. To the best of our knowledge, the ery Containment Benchmark (QC-Bench) is the only benchmark for evaluating these containment solvers. However...
While graph data on the Web and represented in RDF is growing, SPARQL, as the standard query language for RDF still remains largely unusable for the most typical graph query task: nding paths between selected nodes through the graph. Property Paths, as introduced in SPARQL1.1 turn out to be unnt for this task, as they can only be used for testing p...
A single interface for accessing life sciences (LS) data is a natural need to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. This paper demonstrates BioFed, a federated SPARQL query processing system customised...
Background
Biomedical data, e.g.~from knowledge bases and ontologies, is increasingly published following linked data principles preferably as triple data according to the RDF standards. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data...
Background
Several query federation engines have been proposed for accessing public Linked Open Data sources. However, in many domains, resources are sensitive and access to these resources is tightly controlled by stakeholders; consequently, privacy is a major concern when federating queries over such datasets. In the Healthcare and Life Sciences...
There are hundreds of SPARQL endpoints on the Web, but finding an endpoint relevant to a client's needs is difficult: each endpoint acts like a black box, often without a description of its content. Herein we briefly describe Sportal: a system that collects meta-data about the content of endpoints and collects them into a central catalogue over whi...
Cancer is a disease of biological and cell cycle processes, driven by dosage of the limited set of drugs, resistance, mutations, and side effects. The identification of such limited set of drugs and their targets, pathways, and effects based on large scale multi-omics, multi-dimensional datasets is one of key challenging tasks in data-driven cancer...
Access to hundreds of knowledge bases has been made available on the Web through public SPARQL endpoints. Unfortunately, few endpoints publish descriptions of their content (e.g., using VoID). It is thus unclear how agents can learn about the content of a given SPARQL endpoint or, relatedly, find SPARQL endpoints with content relevant to their need...
In this paper, we present LSQ – a Linked Dataset describing SPARQL queries extracted from the logs of four prominent public SPARQL endpoints. We argue that this dataset has a variety of uses for the SPARQL research community, be it to generate custom benchmarks, conduct analyses of SPARQL adoption, see how agents interact with endpoints, find queri...
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable t...
Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Although the Semantic Web and Linked Data technologies help in dealing with data integration problem there remains a barrier adopting these for non-technical research audiences. In this demo we present FedViz...
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. In this demo, we present the interface of FEASIBLE, an automatic approach for the generation of benchmarks out of the sets of queries. The generation is achieved by selecting prototypical queries of a user-defined size from an input s...
With this poster, we will present the Linked SPARQL Query (LSQ) dataset, which describes SPARQL queries taken from the logs of public endpoints. We introduce the initial four query logs that we have taken and the extraction process applied: the types of meta-data captured, how the data are modelled, what vocabularies we use, etc. The LSQ dataset cu...
We present LSQ: a Linked Dataset describing SPARQL queries extracted from the logs of public SPARQL endpoints. We argue that LSQ has a variety of uses for the SPARQL research community, be it for example to generate custom benchmarks or conduct analyses of SPARQL adoption. We introduce the LSQ data model used to describe SPARQL query executions as...
Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Over the last decade the deluge of health care life science data and the standardisation of linked data technologies resulted in publishing datasets of great importance. This emerged as an opportunity to expl...
A significant portion of Web of Data is composed of multiple datasets that add high value to biomedical research. These datasets have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. Different initiatives have been proposed for navigating through these datasets with or without vocabulary reuse. The significance...
In this report, we present LSQ – a Linked Dataset describing SPARQL queries extracted from the logs of a variety of four prominent public SPARQL endpoints. We argue that this dataset has a variety of uses for the SPARQL research community. For example, it can be used generate benchmarks on the fly by selecting real-world queries with specific chara...
Multiple datasets that add high value to biomedical research have been exposed on the web as a part of the Life Sciences Linked Open Data (LSLOD) Cloud. The ability to easily navigate through these datasets is crucial for personalized medicine and the improvement of drug discovery process. However, navigating these multiple datasets is not trivial...
Presence based notification systems play a pivotal role in any collaborative working environment by providing near real time information about the status, locality and presence of the collaborators. Instant Messaging (IM) tools provide a simple low cost solution to support communication and collaboration in the working environment. With the wide ad...