
Eric Prud'hommeauxMassachusetts Institute of Technology | MIT
Eric Prud'hommeaux
About
69
Publications
14,129
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,893
Citations
Introduction
Additional affiliations
January 1995 - present
Publications
Publications (69)
Knowledge Graphs (KGs) such as Wikidata act as a hub of information from multiple domains and disciplines, and is crowdsourced by multiple stakeholders. The vast amount of available information makes it difficult for researchers to manage the entire KG, which is also continually being edited. It is necessary to develop tools that extract subsets fo...
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task s...
Background
Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based st...
Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases. Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Sci...
Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the ado...
This study developed and evaluated a JSON-LD 1.1 approach to automate the Resource Description Framework (RDF) serialization and deserialization of Fast Healthcare Interoperability Resources (FHIR) data, in preparation for updating the FHIR RDF standard. We first demonstrated that this JSON-LD 1.1 approach can produce the same output as the current...
By mistake this chapter was originally published non open access. This has been corrected.
We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case fo...
The RDF data model forms a cornerstone of the Semantic Web technology stack. Although there have been different proposals for RDF serialization syntaxes, the underlying simple data model enables great flexibility which allows it to be successfully employed in many different scenarios and to form the basis on which other technologies are developed....
People have been using computers to record and reason about data for many decades. Typically, this reasoning is less esoteric than artificial intelligence tasks like classification.
This chapter includes a short overview of the RDF data model and the Turtle notation, as well as some technologies like SPARQL, RDF Schema, and OWL that form part of the RDF ecosystem.
Shapes Constraint Language (SHACL) has been developed by the W3C RDF Data Shapes Working Group, which was chartered in 2014 with the goal to “produce a language for defining structural constraints on RDF graphs [6].”
Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the language were adopted from Turtle [80] and SPARQL [44] with tokens for groupi...
In this chapter we describe several applications of RDF validation. We start with the WebIndex, a medium-size linked data portal that was one of the earliest applications of ShEx. We describe it using ShEx and SHACL so the reader can see how both formalisms can be applied to describe RDF data.
In this chapter we present a comparison between ShEx and SHACL. The technologies have similar goals and similar features. In fact at the start of the Data Shapes Working Group in 2014, convergence on a unified approach was considered possible. However, this did not happen and as of July 2017 both technologies are maintained as separate solutions.
We present a formal semantics and proof of soundness for shapes schemas, an expressive schema language for RDF graphs that is the foundation of Shape Expressions Language 2.0. It can be used to describe the vocabulary and the structure of an RDF graph, and to constrain the admissible properties and values for nodes in that graph. The language defin...
RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic...
In this paper, we present a platform known as D2Refine for facilitating clinical research study data element harmonization and standardization. D2Refine is developed on top of OpenRefine (formerly Google Refine) and leverages simple interface and extensible architecture of OpenRefine. D2Refine empowers the tabular representation of clinical researc...
Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary's metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized...
Background
HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging open standard for the exchange of electronic healthcare information. FHIR resources are defined in a specialized modeling language. FHIR instances can currently be represented in either XML or JSON. The FHIR and Semantic Web communities are developing a third FHIR insta...
Linked data portals need to be able to advertise and describe the structure of their content. A sufficiently expressive and intuitive schema language will allow portals to communicate these structures. Validation tools will aid in the publication and maintenance of linked data and increase their quality. Two schema language proposals have recently...
The OHDSI Common Data Model (CDM) is a deep information model, in which its vocabulary component plays a critical role in enabling consistent coding and query of clinical data. The objective of the study is to create methods and tools to expose the OHDSI vocabularies and mappings as the vocabulary mapping services using two HL7 FHIR core terminolog...
A variety of data models have been developed to provide a standardized data interface that supports organizing clinical research data into a standard structure for building the integrated data repositories. HL7 Fast Healthcare Interoperability Resources (FHIR) is emerging as a next generation standards framework for facilitating health care and ele...
Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing...
In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by...
Domain-specific common data elements (CDEs) are emerging as an effective approach to standards-based clinical research data storage and retrieval. A limiting factor, however, is the lack of robust automated quality assurance (QA) tools for the CDEs in clinical study domains. The objectives of the present study are to prototype and evaluate a QA too...
We present Shape Expressions (ShEx), an expressive schema language for RDF
designed to provide a high-level, user friendly syntax with intuitive
semantics. ShEx allows to describe the vocabulary and the structure of an RDF
graph, and to constrain the allowed values for the properties of a node. It
includes an algebraic grouping operator, a choice o...
EDITOR'S SUMMARY
Defined in 1999 and paired with XML, the Resource Description Framework (RDF) has been cast as an RDF Schema, producing data that is well‐structured but not validated, permitting certain illogical relationships. When stakeholders convened in 2014 to consider solutions to the data validation challenge, a W3C working group proposed R...
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alterna...
There is a growing interest in the validation of RDF based solutions where one can express the topology of an RDF graph using some schema language that can check if RDF documents comply with it. Shape Expressions have been proposed as a simple, intuitive language that can be used to describe expected graph patterns and to validate RDF graphs agains...
RDF is a graph based data model which is widely used for semantic web and linked data applications. In this paper we describe a Shape Expression definition language which enables RDF validation through the declaration of constraints on the RDF model. Shape Expressions can be used to validate RDF data, communicate expected graph patterns for interfa...
Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii)...
Linked Data has forged new ground in developing easy-touse, distributed databases. The prevalence of this data has enabled a new genre of social and scientific applications. At the same time, Semantic Web technology has failed to significantly displace SQL or XML in industrial applications, in part because it offers no equivalent schema publication...
We propose shape expression schema (\hEx), a novel schema formalism for
describing the topology of an RDF graph that uses regular bag expressions
(RBEs) to define constraints on the admissible neighborhood for the nodes of a
given type. We provide two alternative semantics, multi- and single-type,
depending on whether or not a node may have more th...
In order to improve the quality of linked data portals, it is necessary to have a tool that can automatically describe and validate the RDF triples exposed. RDF Shape Expressions have been proposed as a language based on Regular Expressions that can describe and validate the structure of RDF graphs. In this paper we describe the WebIndex, a medium...
The systems paradigm of modern medicine presents both, an opportunity and a challenge, for current Information and Communication Technology (ICT). The opportunity is to understand the spatio-temporal organisation and dynamics of the human body as an integrated whole, incorporating the biochemical, physiological, and environmental interactions that...
Two main trends emerged in the enterprise in the past years. On one hand, Web 2.0 tools such as blogs, microblogs and wikis for enterprise-scale collaboration and information management became widely used for information management, leading to a move to "Enterprise 2.0". At the same time, Semantic Web technologies have emerged allowing enterprise u...
Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variety of genomic and drug-related data sets as Resource Description Framework (RDF) triples. This experience has helped the interest group define a general data workflow for mapping health care and life science (HCLS) data to RDF and linking it with other L...
Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-thro...
Understanding how each individual's genetics and physiology influences pharmaceutical response is crucial to the realization of personalized medicine and the discovery and validation of pharmacogenomic biomarkers is key to its success. However, integration of genotype and phenotype knowledge in medical information systems remains a critical challen...
Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-thro...
The W3C's "Direct Mapping of Relational Data to RDF" defines a simple, practical and intuitive interpretation of SQL database tables as RDF graphs. This document specifies the formal data models for RDB (Relational DataBase) and RDF and defines a denotational semantics of RDB in the RDF domain. We show how this mapping treats all of the important f...
Translational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery.
We developed the Tra...
Supplement 01 (v03) to “The Translational Medicine Ontology and Knowledge Base: Driving personalized medicine by bridging the gap between bench and bedside” A supplemental document containing the TMKB SPARQL queries and results created for this manuscript.
There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD)...
Query federation is a way to make use of distributed data sources on the Web, including many SPARQL endpoints as well as relational databases. SWObjects makes use of mapping rules represented as SPARQL Constructs to dynamically rewrite the terms and predicates of a SPARQL query into corresponding terms in another vocabulary and connect the resultin...
The Translational Medicine Ontology provides terminology that bridges diverse areas of translational medicine including hypothesis management, discovery research, drug devel-opment and formulation, clinical research, and clinical prac-tice. Designed primarily from use cases, the ontology con-sists of essential terms that are mapped to other ontolog...
In the past two decades, industry-academia collaboration has emerged as the new paradigm in pharmaceutical re-search. The long term success of such partnerships depends on unfettered sharing of data between researchers who op-erate in two vastly different environments. Moreover, some of these sources contain data that needs to be secured for a vari...
This paper describes a Semantic Web (SW) model for gene lists and the metadata required for their practical interpretation. Our provenance information captures the context of experiments as well as the processing and analysis parameters involved in deriving the gene lists from DNA microarray experiments. We demonstrate a range of practical neurosci...
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse c...
The Semantic Web for Health Care and Life Sciences Workshop will be held in Beijing, China, on April 22, 2008. The goal of the workshop is to foster the development and advancement in the use of Semantic Web technologies to facilitate collaboration, research and development, and innovation adoption in the domains of Health Care and Life Sciences, W...
https://www.w3.org/TR/rdf-sparql-query/
RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via mi...
INTRODUCTION Traditional bookmark systems provide inadequate support for Web users with variety of devices including smart phones and their browsers. In addition, they offer very little support for sharing the bookmarks and topics between groups of users working together. The Web is a collaborative space that lets users share their thoughts, their...
Annotea is a Web-based shared annotation system based on a general-purpose open resource description framework (RDF) infrastructure, where annotations are modeled as a class of metadata. Annotations are viewed as statements made by an author about a Web document. Annotations are external to the documents and can be stored in one or more annotation...
We describe our investigation of the effect of persistent connections, pipelining and link level document compression on our client and server HTTP implementations. A simple test setup is used to verify HTTP/1.1's design and understand HTTP/1.1 implementation strategies. We present TCP and real time performance data between the libwww robot [27] an...