Eric Prud'hommeaux

Eric Prud'hommeaux
Massachusetts Institute of Technology | MIT

About

69
Publications
14,129
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,893
Citations
Additional affiliations
January 1995 - present
Massachusetts Institute of Technology
Position
  • Sanitation Engineer
Description
  • geek

Publications

Publications (69)
Preprint
Knowledge Graphs (KGs) such as Wikidata act as a hub of information from multiple domains and disciplines, and is crowdsourced by multiple stakeholders. The vast amount of available information makes it difficult for researchers to manage the entire KG, which is also continually being edited. It is necessary to develop tools that extract subsets fo...
Article
Full-text available
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated using validation schemas and bot edits. In this research article, we catalog an automatable task s...
Article
Background Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based st...
Preprint
Full-text available
Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases. Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Sci...
Article
Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the ado...
Article
This study developed and evaluated a JSON-LD 1.1 approach to automate the Resource Description Framework (RDF) serialization and deserialization of Fast Healthcare Interoperability Resources (FHIR) data, in preparation for updating the FHIR RDF standard. We first demonstrated that this JSON-LD 1.1 approach can produce the same output as the current...
Chapter
Full-text available
We discuss Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. For instance, a Shape Expression could prescribe that subjects in a given RDF graph that fall into the shape “Paper” are expected to have a section called “Abstract”, and any ShEx implementation can confirm whether that is indeed the case fo...
Chapter
The RDF data model forms a cornerstone of the Semantic Web technology stack. Although there have been different proposals for RDF serialization syntaxes, the underlying simple data model enables great flexibility which allows it to be successfully employed in many different scenarios and to form the basis on which other technologies are developed....
Chapter
People have been using computers to record and reason about data for many decades. Typically, this reasoning is less esoteric than artificial intelligence tasks like classification.
Chapter
This chapter includes a short overview of the RDF data model and the Turtle notation, as well as some technologies like SPARQL, RDF Schema, and OWL that form part of the RDF ecosystem.
Chapter
Shapes Constraint Language (SHACL) has been developed by the W3C RDF Data Shapes Working Group, which was chartered in 2014 with the goal to “produce a language for defining structural constraints on RDF graphs [6].”
Chapter
Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the language were adopted from Turtle [80] and SPARQL [44] with tokens for groupi...
Chapter
In this chapter we describe several applications of RDF validation. We start with the WebIndex, a medium-size linked data portal that was one of the earliest applications of ShEx. We describe it using ShEx and SHACL so the reader can see how both formalisms can be applied to describe RDF data.
Chapter
In this chapter we present a comparison between ShEx and SHACL. The technologies have similar goals and similar features. In fact at the start of the Data Shapes Working Group in 2014, convergence on a unified approach was considered possible. However, this did not happen and as of July 2017 both technologies are maintained as separate solutions.
Conference Paper
We present a formal semantics and proof of soundness for shapes schemas, an expressive schema language for RDF graphs that is the foundation of Shape Expressions Language 2.0. It can be used to describe the vocabulary and the structure of an RDF graph, and to constrain the admissible properties and values for nodes in that graph. The language defin...
Article
Full-text available
RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic...
Article
In this paper, we present a platform known as D2Refine for facilitating clinical research study data element harmonization and standardization. D2Refine is developed on top of OpenRefine (formerly Google Refine) and leverages simple interface and extensible architecture of OpenRefine. D2Refine empowers the tabular representation of clinical researc...
Article
Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary's metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized...
Article
Background HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging open standard for the exchange of electronic healthcare information. FHIR resources are defined in a specialized modeling language. FHIR instances can currently be represented in either XML or JSON. The FHIR and Semantic Web communities are developing a third FHIR insta...
Article
Full-text available
Linked data portals need to be able to advertise and describe the structure of their content. A sufficiently expressive and intuitive schema language will allow portals to communicate these structures. Validation tools will aid in the publication and maintenance of linked data and increase their quality. Two schema language proposals have recently...
Article
The OHDSI Common Data Model (CDM) is a deep information model, in which its vocabulary component plays a critical role in enabling consistent coding and query of clinical data. The objective of the study is to create methods and tools to expose the OHDSI vocabularies and mappings as the vocabulary mapping services using two HL7 FHIR core terminolog...
Article
A variety of data models have been developed to provide a standardized data interface that supports organizing clinical research data into a standard structure for building the integrated data repositories. HL7 Fast Healthcare Interoperability Resources (FHIR) is emerging as a next generation standards framework for facilitating health care and ele...
Conference Paper
Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing...
Article
In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by...
Article
Domain-specific common data elements (CDEs) are emerging as an effective approach to standards-based clinical research data storage and retrieval. A limiting factor, however, is the lack of robust automated quality assurance (QA) tools for the CDEs in clinical study domains. The objectives of the present study are to prototype and evaluate a QA too...
Article
Full-text available
We present Shape Expressions (ShEx), an expressive schema language for RDF designed to provide a high-level, user friendly syntax with intuitive semantics. ShEx allows to describe the vocabulary and the structure of an RDF graph, and to constrain the allowed values for the properties of a node. It includes an algebraic grouping operator, a choice o...
Article
EDITOR'S SUMMARY Defined in 1999 and paired with XML, the Resource Description Framework (RDF) has been cast as an RDF Schema, producing data that is well‐structured but not validated, permitting certain illogical relationships. When stakeholders convened in 2014 to consider solutions to the data validation challenge, a W3C working group proposed R...
Article
Full-text available
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alterna...
Article
Full-text available
There is a growing interest in the validation of RDF based solutions where one can express the topology of an RDF graph using some schema language that can check if RDF documents comply with it. Shape Expressions have been proposed as a simple, intuitive language that can be used to describe expected graph patterns and to validate RDF graphs agains...
Article
Full-text available
RDF is a graph based data model which is widely used for semantic web and linked data applications. In this paper we describe a Shape Expression definition language which enables RDF validation through the declaration of constraints on the RDF model. Shape Expressions can be used to validate RDF data, communicate expected graph patterns for interfa...
Article
Full-text available
Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii)...
Conference Paper
Linked Data has forged new ground in developing easy-touse, distributed databases. The prevalence of this data has enabled a new genre of social and scientific applications. At the same time, Semantic Web technology has failed to significantly displace SQL or XML in industrial applications, in part because it offers no equivalent schema publication...
Article
Full-text available
We propose shape expression schema (\hEx), a novel schema formalism for describing the topology of an RDF graph that uses regular bag expressions (RBEs) to define constraints on the admissible neighborhood for the nodes of a given type. We provide two alternative semantics, multi- and single-type, depending on whether or not a node may have more th...
Article
In order to improve the quality of linked data portals, it is necessary to have a tool that can automatically describe and validate the RDF triples exposed. RDF Shape Expressions have been proposed as a language based on Regular Expressions that can describe and validate the structure of RDF graphs. In this paper we describe the WebIndex, a medium...
Article
Full-text available
The systems paradigm of modern medicine presents both, an opportunity and a challenge, for current Information and Communication Technology (ICT). The opportunity is to understand the spatio-temporal organisation and dynamics of the human body as an integrated whole, incorporating the biochemical, physiological, and environmental interactions that...
Chapter
Two main trends emerged in the enterprise in the past years. On one hand, Web 2.0 tools such as blogs, microblogs and wikis for enterprise-scale collaboration and information management became widely used for information management, leading to a move to "Enterprise 2.0". At the same time, Semantic Web technologies have emerged allowing enterprise u...
Article
Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variety of genomic and drug-related data sets as Resource Description Framework (RDF) triples. This experience has helped the interest group define a general data workflow for mapping health care and life science (HCLS) data to RDF and linking it with other L...
Article
Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-thro...
Article
Full-text available
Understanding how each individual's genetics and physiology influences pharmaceutical response is crucial to the realization of personalized medicine and the discovery and validation of pharmacogenomic biomarkers is key to its success. However, integration of genotype and phenotype knowledge in medical information systems remains a critical challen...
Conference Paper
Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-thro...
Conference Paper
The W3C's "Direct Mapping of Relational Data to RDF" defines a simple, practical and intuitive interpretation of SQL database tables as RDF graphs. This document specifies the formal data models for RDB (Relational DataBase) and RDF and defines a denotational semantics of RDB in the RDF domain. We show how this mapping treats all of the important f...
Article
Full-text available
Translational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery. We developed the Tra...
Data
Supplement 01 (v03) to “The Translational Medicine Ontology and Knowledge Base: Driving personalized medicine by bridging the gap between bench and bedside” A supplemental document containing the TMKB SPARQL queries and results created for this manuscript.
Article
Full-text available
There is an abundance of information about drugs available on the Web. Data sources range from medicinal chemistry results, over the impact of drugs on gene expression, to the outcomes of drugs in clinical trials. These data are typically not connected together, which reduces the ease with which insights can be gained. Linking Open Drug Data (LODD)...
Article
Full-text available
Query federation is a way to make use of distributed data sources on the Web, including many SPARQL endpoints as well as relational databases. SWObjects makes use of mapping rules represented as SPARQL Constructs to dynamically rewrite the terms and predicates of a SPARQL query into corresponding terms in another vocabulary and connect the resultin...
Article
Full-text available
The Translational Medicine Ontology provides terminology that bridges diverse areas of translational medicine including hypothesis management, discovery research, drug devel-opment and formulation, clinical research, and clinical prac-tice. Designed primarily from use cases, the ontology con-sists of essential terms that are mapped to other ontolog...
Article
In the past two decades, industry-academia collaboration has emerged as the new paradigm in pharmaceutical re-search. The long term success of such partnerships depends on unfettered sharing of data between researchers who op-erate in two vastly different environments. Moreover, some of these sources contain data that needs to be secured for a vari...
Article
This paper describes a Semantic Web (SW) model for gene lists and the metadata required for their practical interpretation. Our provenance information captures the context of experiments as well as the processing and analysis parameters involved in deriving the gene lists from DNA microarray experiments. We demonstrate a range of practical neurosci...
Article
Full-text available
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse c...
Conference Paper
Full-text available
The Semantic Web for Health Care and Life Sciences Workshop will be held in Beijing, China, on April 22, 2008. The goal of the workshop is to foster the development and advancement in the use of Semantic Web technologies to facilitate collaboration, research and development, and innovation adoption in the domains of Health Care and Life Sciences, W...
Article
https://www.w3.org/TR/rdf-sparql-query/ RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via mi...
Article
Full-text available
INTRODUCTION Traditional bookmark systems provide inadequate support for Web users with variety of devices including smart phones and their browsers. In addition, they offer very little support for sharing the bookmarks and topics between groups of users working together. The Web is a collaborative space that lets users share their thoughts, their...
Article
Annotea is a Web-based shared annotation system based on a general-purpose open resource description framework (RDF) infrastructure, where annotations are modeled as a class of metadata. Annotations are viewed as statements made by an author about a Web document. Annotations are external to the documents and can be stored in one or more annotation...
Article
Full-text available
We describe our investigation of the effect of persistent connections, pipelining and link level document compression on our client and server HTTP implementations. A simple test setup is used to verify HTTP/1.1's design and understand HTTP/1.1 implementation strategies. We present TCP and real time performance data between the libwww robot [27] an...

Network

Cited By