• Home
  • Alasdair J. G. Gray
Alasdair J. G. Gray

Alasdair J. G. Gray
  • PhD in Computer Science
  • Data Consultant at TPXimpact

About

128
Publications
32,671
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,863
Citations
Introduction
My research focuses on practical data management and its application in information systems. This is very relevant for the facing the challenges of Big Data. Principally I work on the challenges of integrating data from heterogeneous sources; where the heterogeneity can be from the way the data is modelled (variety), represented (variety), or rate of change (velocity). The result is a coherent representation of the data to enable analytical or other processing of the data.
Current institution
TPXimpact
Current position
  • Data Consultant
Additional affiliations
September 2013 - present
Heriot-Watt University
Position
  • Lecturer
February 2009 - September 2013
University of Manchester
Position
  • Research Associate
July 2007 - January 2009
University of Glasgow
Position
  • Research Associate

Publications

Publications (128)
Article
Full-text available
Wikidata is a collaborative multi-purpose Knowledge Graph (KG) with the unique feature of adding provenance data to the statements of items as a reference. More than 73% of Wikidata statements have provenance metadata; however, few studies exist on the referencing quality in this KG, focusing only on the relevancy and trustworthiness of external so...
Article
Full-text available
The digitalisation of the regulatory compliance process has been an active area of research for several decades. However, more recently the level of activities in this area has increased considerably. In the UK, the tragic incident of Grenfell fire in 2017 has been a major catalyst for this as a result of the Hackitt report's recommendations pointi...
Article
Full-text available
Wikidata is a massive Knowledge Graph (KG), including more than 100 million data items and nearly 1.5 billion statements, covering a wide range of topics such as geography, history, scholarly articles, and life science data. The large volume of Wikidata is difficult to handle for research purposes; many researchers cannot afford the costs of hostin...
Article
Full-text available
Since 2014, “Bring Your Own Data” workshops (BYODs) have been organised to inform people about the process and benefits of making resources Findable, Accessible, Interoperable, and Reusable (FAIR, and the FAIRification process). The BYOD workshops’ content and format differ depending on their goal, context, and the background and needs of participa...
Article
Full-text available
Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology,...
Conference Paper
Full-text available
Regulations and test criteria for building products are captured in hundreds of interrelated documents. It can be daunting to figure out which of these documents contain information that is relevant to your building project or product. In this paper, we describe work on an Information Retrieval (IR) system that aims to search through the contents o...
Conference Paper
Full-text available
Wouldn’t it be great if we could automatically check whether a Building Information Model (BIM) complies with all the relevant building regulations? Despite a plethora of motivations and a long history of research, the Automated Compliance Checking (ACC) problem is far from solved. We argue that a general solution to ACC may not be feasible based o...
Conference Paper
Full-text available
Compliance Checking (CC) would be a lot easier if we could automatically map between (1) terms that occur in building regulations and (2) elements of buildings and building products. However, the terminology used in the regulations is vastly different from the terminology found in Building Information Models (BIM). We are therefore forced to someho...
Article
Full-text available
The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular dat...
Article
Full-text available
The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good data stewardship and a prerequisite for reproducibility. Nowadays, FAIR guides data policy actions and professional practices in the public and private sectors. Despite such global endorsements, howeve...
Preprint
Bioschemas is a grassroots community effort to improve FAIRness of resources in the Life sciences by defining specific Life Science metadata schemas and exposing that metadata from resources that have adopted it. Now that some initial types have been adopted directly into schema.org, an improved mechanism is required to reignite community engagemen...
Chapter
Full-text available
The Internet is the most complex machine humankind has ever built, and how to defense it from intrusions is even more complex. With the ever increasing of new intrusions, intrusion detection task rely on Artificial Intelligence more and more. Interpretability and transparency of the machine learning model is the foundation of trust in AI-driven int...
Preprint
Schema.org and Bioschemas are lightweight vocabularies that aim at making the contents of web pages machine-readable so that software agents can consume that content and understand it in an actionable way. Due to the time needed to process each page, extracting markup by visiting each page of a site is not practical for huge sites. This approach im...
Article
Full-text available
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be...
Article
Full-text available
To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-base...
Preprint
The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from...
Preprint
Full-text available
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be...
Article
Full-text available
Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology,...
Conference Paper
Full-text available
Wikidata is the only general-purpose open knowledge graph with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating ref...
Preprint
Full-text available
Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a lim...
Preprint
Full-text available
Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge...
Preprint
One of the goals of the ELIXIR Intrinsically Disordered Protein (IDP) community is create a registry called IDPcentral. The registry will aggregate data contained in the community's specialist data sources such as DisProt, MobiDB, and Protein Ensemble Database (PED) so that proteins that are known to be intrinsically disordered can be discovered; w...
Conference Paper
Full-text available
Wikidata is a general-purpose knowledge graph covering a wide variety of topics with content being crowd-sourced through an open wiki. There are now over 90M interrelated data items in Wikidata which are accessible through a public query endpoint and data dumps. However, execution timeout limits and the size of data dumps make it difficult to use t...
Preprint
Full-text available
Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases. Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Sci...
Preprint
Full-text available
In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access p...
Article
Offshore decommissioning represents significant business opportunities for oil and gas service companies. However, for owners of offshore assets and regulators, it is a liability because of the associated costs. One way of mitigating decommissioning costs is through the sales and reuse of decommissioned items. To achieve this effectively, reliabili...
Conference Paper
Nanopublications are a granular way of publishing scientific claims together with their associated provenance and publication information. More than 10 million nanopublications have been published by a handful of researchers covering a wide range of topics within the life sciences. We were motivated to replicate an existing analysis of these nanopu...
Chapter
Full-text available
The TOUCAN project proposed an ontology for telecommunication networks with hybrid technologies – the TOUCAN Ontology (ToCo), available at http://purl.org/toco/, as well as a knowledge design pattern Device-Interface-Link (DIL) pattern. The core classes and relationships forming the ontology are discussed in detail. The ToCo ontology can describe t...
Conference Paper
Full-text available
The TOUCAN project proposed an ontology for telecom- munication networks with hybrid technologies – the TOUCAN Ontol- ogy (ToCo), available at http://purl.org/toco/, as well as a knowl- edge design pattern Device-Interface-Link (DIL) pattern. The core classes and relationships forming the ontology are discussed in detail. The ToCo ontology can desc...
Conference Paper
Full-text available
In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless net- works with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system – SEANET, SARA selects the access...
Preprint
Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access t...
Book
Full-text available
This book constitutes the refereed proceedings of the 16th International Semantic Web Conference, ESWC 2019, held in Portorož, Slovenia. The 39 revised full papers presented were carefully reviewed and selected from 134 submissions. The papers are organized in three tracks: research track, resources track, and in-use track and deal with the followi...
Conference Paper
Full-text available
Modern Software Defined Networking (SDN) control stacks consist of multiple abstraction and virtualization layers to enable flexibility in the development of new control features. Rich data modeling frameworks are essential when sharing information across control layers. Unfortunately, existing Network Operating System (NOS) data modeling capabilit...
Article
Information management during the construction phase of a built asset involves multiple stakeholders using multiple software applications to generate and store data. This is problematic as data comes in different forms and is labour intensive to piece together. Existing solutions to this problem are predominantly in proprietary applications, which...
Technical Report
Full-text available
Data Science is increasingly being accepted as one of the crucial technologies for the wellbeing and prosperity of nations. As Data Science can be the source and enabler of large-scale social and commercial change, finding the research challenges in Data Science is now a critical aspect for most scientific research and businesses. Data Science is a...
Article
Full-text available
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Ph...
Article
The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Ph...
Conference Paper
Full-text available
Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access t...
Article
Full-text available
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity...
Preprint
Full-text available
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical...
Preprint
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical...
Preprint
Full-text available
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity...
Article
Full-text available
Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administra...
Preprint
Full-text available
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, or EUDat). These data have widely different levels of sensitivity and securi...
Article
Full-text available
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical...
Preprint
Full-text available
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical...
Article
Full-text available
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent...
Article
Full-text available
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, or EUDat). These data have widely different levels of sensitivity and securi...
Book
The two-volume set LNCS 9981 and 9982 constitutes the refereed proceedings of the 15th International Semantic Web Conference, ISWC 2016, which was held in Kobe, Japan, in October 2016. The 75 full papers presented in these proceedings were carefully reviewed and selected from 326 submissions. The International Semantic Web Conference is the premier...
Book
The two-volume set LNCS 9981 and 9982 constitutes the refereed proceedings of the 15th International Semantic Web Conference, ISWC 2016, which was held in Kobe, Japan, in October 2016. The 75 full papers presented in these proceedings were carefully reviewed and selected from 326 submissions. The International Semantic Web Conference is the premier...
Conference Paper
Full-text available
Validata is an online web application for validating an RDF document against a set of constraints. This is useful for data exchange applications or ensuring conformance of an RDF dataset against a community agreed standard. Constraints are expressed as a Shape Expression (ShEx) schema. Validata extends the ShEx functionality to support multiple req...
Technical Report
The HCLS Profile a community standard for describing datasets in the Health Care and Life Sciences domain. Contributed to describing dataset statistics and several other issues
Conference Paper
Full-text available
When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equiv...
Article
Full-text available
Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii)...
Article
Full-text available
Linked data systems rely on the quality of, and linking between, their data sources. However, existing data is difficult to trace to its origin and provides no provenance for links. This article discusses the need for self-describing linked data.
Article
Full-text available
Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources....
Conference Paper
Full-text available
Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources....
Article
Data integration is a key challenge faced in pharmacology where there are numerous heterogenous databases spanning multiple domains (e.g. chemistry and biology). To address this challenge, the Open PHACTS consortium has developed the Open PHACTS Discovery Platform that leverages Linked Data to provide integrated access to pharmacology databases. Be...
Article
The Open PHACTS VoID Editor helps non-Semantic Web experts to create machine interpretable descriptions for their datasets. The web app guides the user, an expert in the domain of the data, through a series of questions to capture details of their dataset and then generates a VoID dataset description. The generated dataset description conforms to t...
Chapter
Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii)...
Conference Paper
Full-text available
The Open PHACTS Explorer is a web application that supports drug discovery via the Open PHACTS API without requiring knowledge of SPARQL or the RDF data being searched. It provides a UI layer on top of the Open PHACTS linked data cache and also provides a javascript library to facili-tate easy access to the Open PHACTS API.
Article
Full-text available
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purp...
Conference Paper
Full-text available
This paper presents the rules used within the Open PHACTS (http://www.openphacts.org) Identity Management Service to compute co-reference chains across multiple datasets. The web of (linked) data has encouraged a proliferation of identifiers for the concepts captured in datasets; with each dataset using their own identifier. A key data integration...
Conference Paper
Abstract The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open d...
Article
Growing demand for food is driving the need for higher crop yields globally. Correctly anticipating the onset of damaging crop diseases is essential to achieve this goal. Considerable efforts have been made recently to develop early warning systems. However, these methods lack a direct and online measurement of the spores that attack crops. A novel...
Conference Paper
Full-text available
Background / Purpose: Within complex scientific domains such as pharmacology, operational equivalence between two concepts is often context-, user- and task-specific. For example, searches for the chemical Fluvastatin on ChemSpider and DrugBank return different compounds: although their basic chemical structure matches, the compounds differ in th...
Article
Linked data relies on instance level links between potentially differing representations of concepts in multiple datasets. However, in large complex domains, such as pharmacology, the inter-relationship of data instances needs to consider the context (e.g. task, role) of the user and the assumptions they want to apply to the data. Such context is n...
Article
Full-text available
The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applicati...
Chapter
Full-text available
Within complex scientific domains such as pharmacology, operational equivalence between two concepts is often context-, user- and task-specific. Existing Linked Data integration procedures and equivalence services do not take the context and task of the user into account. We present a vision for enabling users to control the notion of operational e...
Article
Full-text available
Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the...
Article
Full-text available
This document specifies a standard format for vocabularies based on the W3C's Resource Description Framework (RDF) and Simple Knowledge Organization System (SKOS). By adopting a standard and simple format, the IVOA will permit different groups to create and maintain their own specialised vocabularies while letting the rest of the astronomical commu...
Conference Paper
Full-text available
The SNEE query optimizer enables users to characterize data requests against wireless sensor networks (WSNs), using a declarative query language called SNEEql (SNEE for Sensor NEtwork Engine, described in [GBG+11], and publicly available at http://code.google.com/p/snee ). Queries are compiled into imperative query execution plans, which are transl...
Conference Paper
Full-text available
Sensor Networks have received considerable attention recently, as they provide manifold benefits. Not only are they a means for data acquisition and monitoring of unexplored or inaccessible areas, they are also a low-cost alternative for sensing the environment, which greatly aids to better understand our surroundings. A major motivation in either...
Article
Full-text available
Sensor networks have become ubiquitous and their proliferation in day-to-day life provides new research challenges. Sensors deployed at forest sites, high performance facilities, or areas striken by environmental, or other, phenomena, are only a few representative examples. More recently, mobile sensor networks have made their presence and are rapi...
Conference Paper
Full-text available
Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g. good emergency response. However, in order to interpret the readings from the sensors, the data needs to be put in context through correlation with other...
Presentation
Full-text available
Tutorial at ESWC 2011: Building Semantic Sensor Webs and Applications
Article
Full-text available
A wireless sensor network (WSN) can be construed as an intelligent, largescale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declara...
Book
This book constitutes the thoroughly refereed post-conference proceedings of the 28th British National Conference on Databases, BNCOD 28, held in Manchester, UK, in July 2011. The 13 revised full papers, 2 short papers, 2 demo papers and 1 poster paper presented together with the abstracts of 2 keynote talks and 1 tutorial paper were carefully revi...
Article
This document specifies, designs, and validates the Semantic Sensor Grid Rapid Application Development for Environmental Management (SemSorGrid4Env) software architecture. The architecture enables the publication and querying of both stored (e.g. database) and streaming (e.g. sensor) data to support the rapid development of applications for environ...
Conference Paper
Full-text available
The availability of streaming data sources is progressively increasing thanks to the development of ubiquitous data capturing technologies such as sensor networks. The heterogeneity of these sources introduces the requirement of providing data access in a unified and coherent manner, whilst allowing the user to express their needs at an ontological...
Article
Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on...
Conference Paper
The British National Conference on Databases (bncod), now in its 27th edition, has covered a broad range of database research topics: from the purely theoretical, to more application-oriented subjects. It has proved to be a forum for intellectual debate, and has fostered a sense of community amongst British and overseas database researchers. Databa...
Article
There are multiple vocabularies and thesauri within astronomy, of which the best known are the 1993 IAU Thesaurus and the keyword list maintained by A&A, ApJ and MNRAS. The IVOA has agreed on a standard for publishing vocabularies, based on the W3C skos standard, to allow greater automated interaction with them, in particular on the Web. This allow...

Network

Cited By