Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The introduction of -omic information within current clinical treatment is one of the main challenges to transfer the huge amount of genomic-based results. The number of potential translational clinical trials is therefore experiencing a dramatic increase, with the corresponding increment on patient variability. Such scenario requires a larger population to recruit a minimum set of patients that may involve multi-centric trials, with associated challenges on heterogeneous data integration. To ensure sustainability on clinical trial management, semantic interoperability is one of the main goals addressed by international initiatives such as the EU funded INTEGRATE project: "Driving Excellence in Integrative Cancer Research". This paper describes the approach adopted within an international research initiative, providing a homogeneous platform to manage clinical information from patients on breast cancer clinical trials. Following the project "leitmotif" of reusing standards supported by a large community, we have developed a solution providing a common data model (i.e. HL7 RIM-based), a biomedical domain vocabulary (i.e. SNOMED) as core dataset and resources from the semantic web community adapted for the biomedical domain. After one year and a half of collaboration, the INTEGRATE consortium has been able to develop a solution providing the reasoning capabilities required to solve clinical trial patient recruitment. The next challenge will be to extend the current solution to support a cohort selection tool allowing prospective analysis and predictive modeling.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A Trial metadata repository contains protocol definitions ( Figure 1) and the SNAQL engine executes the DSL of the criteria to find patients belonging to the cohort. The SNAQL engine accesses the semantic integration services which provide a query interface [2] with reasoning abilities. Once a clinical protocol has been finalized, the tool is used to find patients eligible for enrollment. ...
... Our data model leverages the HL7-RIM and the HL7 implementation guidelines 6 . The underlying semantic solution is described in [5]. In [6] we estimated the effort required for implementing mappings to the patient information model, the scalability of our solution with the number of trials and its extensibility to other clinical domains. ...
Article
Full-text available
To support the efficient execution of post-genomic multi-centric clinical trials in breast cancer we propose a solution that streamlines the assessment of the eligibility of patients for available trials. The assessment of the eligibility of a patient for a trial requires evaluating whether each eligibility criterion is satisfied and is often a time consuming and manual task. The main focus in the literature has been on proposing different methods for modelling and formalizing the eligibility criteria. However the current adoption of these approaches in clinical care is limited. Less effort has been dedicated to the automatic matching of criteria to the patient data managed in clinical care. We address both aspects and propose a scalable, efficient and pragmatic patient screening solution enabling automatic evaluation of eligibility of patients for a relevant set of trials. This covers the flexible formalization of criteria and of other relevant trial metadata and the efficient management of these representations.
Article
Semantic interoperability is essential to facilitate efficient collaboration in heterogeneous multi-site healthcare environments. The deployment of a semantic interoperability solution has the potential to enable a wide range of informatics supported applications in clinical care and research both within as ingle healthcare organization and in a network of organizations. At the same time, building and deploying a semantic interoperability solution may require significant effort to carryout data transformation and to harmonize the semantics of the information in the different systems. Our approach to semantic interoperability leverages existing healthcare standards and ontologies, focusing first on specific clinical domains and key applications, and gradually expanding the solution when needed. An important objective of this work is to create a semantic link between clinical research and care environments to enable applications such as streamlining the execution of multi-centric clinical trials, including the identification of eligible patients for the trials. This paper presents an analysis of the suitability of several widely-used medical ontologies in the clinical domain: SNOMED-CT, LOINC, MedDRA, to capture the semantics of the clinical trial eligibility criteria, of the clinical trial data (e.g., Clinical Report Forms), and of the corresponding patient record data that would enable the automatic identification of eligible patients. Next to the coverage provided by the ontologies we evaluate and compare the sizes of the sets of relevant concepts and their relative frequency to estimate the cost of data transformation, of building the necessary semantic mappings, and of extending the solution to new domains. This analysis shows that our approach is both feasible and scalable.
Article
Post-genomic clinical trials require the participation of multiple institutions, and collecting data from several hospitals, laboratories and research facilities. This paper presents a standard-based solution to provide a uniform access endpoint to patient data involved in current clinical research. The proposed approach exploits well-established standards such as HL7 v3 or SPARQL and medical vocabularies such as SNOMED CT, LOINC and HGNC. A novel mechanism to exploit semantic normalization among HL7-based data models and biomedical ontologies has been created by using Semantic Web technologies. Different types of queries have been used for testing the semantic interoperability solution described in this paper. The execution times obtained in the tests enable the development of end user tools within a framework that requires efficient retrieval of integrated data. The proposed approach has been successfully tested by applications within the INTEGRATE and EURECA EU projects. These applications have been deployed and tested for: (i) patient screening, (ii) trial recruitment, and (iii) retrospective analysis; exploiting semantically interoperable access to clinical patient data from heterogeneous data sources. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Article
Advances in the use of -omic data and other biomarkers are increasing the number of variables in clinical research. Additional data have stratified the population of patients and require that current studies be performed among multiple institutions. Semantic interoperability and standardized data representation are a crucial task in the management of modern clinical trials. In the past few years, different efforts have focused on integrating biomedical information. Due to the complexity of this domain and the specific requirements of clinical research, the majority of data integration tasks are still performed manually. This paper presents a semantic normalization process and a query abstraction mechanism to facilitate data integration and retrieval. A process based on well-established standards from the biomedical domain and the latest semantic web technologies has been developed. Methods proposed in this work have been tested within the EURECA EU research project, where clinical scenarios require the extraction of semantic knowledge from biomedical vocabularies. The aim of this work is to provide a novel method to abstract from the data model and query syntax. The proposed approach has been compared with other initiatives in the field by storing the same dataset with each of those solutions. Results show an extended functionality and query capabilities at the cost of slightly worse performance in query execution. Implementations in real settings have shown that following this approach, usable interfaces can be developed to exploit clinical trial data outcomes.
Conference Paper
Current research in oncology, require the involvement of several institutions participating in clinical trials. Heterogeneities of data formats and models require advanced methods to achieve semantic interoperability and provide sustainable solutions. In this field, the EU funded INTEGRATE project aims to develop the basic knowledge to allow data sharing of data from post-genomic clinical trials on breast cancer. In this paper, we describe the procedure implemented in this project and the required binding between relevant terminologies such as SNOMED CT and an HL7 v3 Reference Information Model (RIM)-based data model. After following the HL7 recommendations, we also describe the main issues of this process and the proposed solution, such as concept overlapping and coverage of the domain terminology. Despite the fact that the data from this domain presents a high level of heterogeneity, the methods and solutions introduced in this paper have been successfully applied within the INTEGRATE project context. Results suggest that the level of semantic interoperability required to manage patient data in modern clinical trials on breast cancer can be achieved with the proposed methodology.
Conference Paper
Full-text available
OWLIM is a high-performance Storage and Inference Layer (SAIL) for Sesame, which performs OWL DLP reasoning, based on forward-chaining of entilement rules. The reasoning and query evaluation are performed in- memory, while in the same time OWLIM provides a reliable persistence, based on N-Triples files. This paper presents OWLIM, together with an evaluation of its scalability over synthetic, but realistic, dataset encoded with respect to PROTON ontology. The experiment demonstrates that OWLIM can scale to millions of statements even on commodity desktop hardware. On an almost- entry-level server, OWLIM can manage a knowledge base of 10 million ex- plicit statements, which are extended to about 19 millions after forward chain- ing. The upload and storage speed is about 3,000 statement/sec. at the maximal size of the repository, but it starts at more than 18,000 (for a small repository) and slows down smoothly. As it can be expected for such an inference strategy, delete operations are expensive, taking as much as few minutes. In the same time, a variety of queries can be evaluated within milliseconds. The experiment shows that such reasoners can be efficient for very big knowledge bases, in scenarios when delete operations should not be handled in real-time.
Conference Paper
Full-text available
This paper discusses RDF related work in the context of OpenLink Vir- tuoso, a general purpose relational / federated database and applications platform. We discuss adapting a relational engine for native RDF support with dedicated data types, bitmap indexing and SQL optimizer techniques. We further discuss mapping existing relational data into RDF for SPARQL access without converting the data into physical triples. We present conclusions and metrics as well as a number of use cases, from DBpedia to bio informatics and collaborative web applications.
Article
Initially, Grid technologies were principally associated with supercomputer centres and large-scale scientific applications in physics and astronomy. They are now increasingly seen as being relevant to many areas of e-Science and e-Business. The emergence of the Open Grid Services Architecture (OGSA), to complement the ongoing activity on Web Services standards, promises to provide a service-based platform that can meet the needs of both business and scientific applications. Early Grid applications focused principally on the storage, replication and movement of file-based data. Now the need for the full integration of database technologies with Grid middleware is widely recognized. Not only do many Grid applications already use databases for managing metadata, but increasingly many are associated with large databases of domain-specific information (e.g. biological or astronomical data). This paper describes the design and implementation of OGSA-DAI, a service-based architecture for database access over the Grid. The approach involves the design of Grid Data Services that allow consumers to discover the properties of structured data stores and to access their contents. The initial focus has been on support for access to Relational and XML data, but the overall architecture has been designed to be extensible to accommodate different storage paradigms. The paper describes and motivates the design decisions that have been taken, and illustrates how the approach supports a range of application scenarios. The OGSA-DAI software is freely available from http://www.ogsadai.org.uk. Copyright © 2005 John Wiley & Sons, Ltd.
Conference Paper
We describe an optimised consequence-based procedure for classification of ontologies expressed in a polynomial fragment ELHR+ of the OWL 2 EL profile. A distinguishing property of our procedure is that it can take advantage of multiple processors/cores, which increasingly prevail in computer systems. Our solution is based on a variant of the 'given clause' saturation algorithm for first-order theorem proving, where we assign derived axioms to 'contexts' within which they can be used and which can be processed independently. We describe an implementation of our procedure within the Java-based reasoner ELK. Our implementation is light-weight in the sense that an overhead of managing concurrent computations is minimal. This is achieved by employing lock-free data structures and operations such as 'compare-and-swap'. We report on preliminary experimental results demonstrating a substantial speedup of ontology classification on multi-core systems. In particular, one of the largest and widely-used medical ontologies SNOMED CT can be classified in as little as 5 seconds.
The challenges regarding seamless integration of distributed, heterogeneous and multilevel data arising in the context of contemporary, post-genomic clinical trials cannot be effectively addressed with current methodologies. An urgent need exists to access data in a uniform manner, to share information among different clinical and research centers, and to store data in secure repositories assuring the privacy of patients. Advancing Clinico-Genomic Trials (ACGT) was a European Commission funded Integrated Project that aimed at providing tools and methods to enhance the efficiency of clinical trials in the -omics era. The project, now completed after four years of work, involved the development of both a set of methodological approaches as well as tools and services and its testing in the context of real-world clinico-genomic scenarios. This paper describes the main experiences using the ACGT platform and its tools within one such scenario and highlights the very promising results obtained.
Article
This paper introduces the objectives, methods and results of ontology development in the EU co-funded project Advancing Clinico-genomic Trials on Cancer-Open Grid Services for Improving Medical Knowledge Discovery (ACGT). While the available data in the life sciences has recently grown both in amount and quality, the full exploitation of it is being hindered by the use of different underlying technologies, coding systems, category schemes and reporting methods on the part of different research groups. The goal of the ACGT project is to contribute to the resolution of these problems by developing an ontology-driven, semantic grid services infrastructure that will enable efficient execution of discovery-driven scientific workflows in the context of multi-centric, post-genomic clinical trials. The focus of the present paper is the ACGT Master Ontology (MO). ACGT project researchers undertook a systematic review of existing domain and upper-level ontologies, as well as of existing ontology design software, implementation methods, and end-user interfaces. This included the careful study of best practices, design principles and evaluation methods for ontology design, maintenance, implementation, and versioning, as well as for use on the part of domain experts and clinicians. To date, the results of the ACGT project include (i) the development of a master ontology (the ACGT-MO) based on clearly defined principles of ontology development and evaluation; (ii) the development of a technical infrastructure (the ACGT Platform) that implements the ACGT-MO utilizing independent tools, components and resources that have been developed based on open architectural standards, and which includes an application updating and evolving the ontology efficiently in response to end-user needs; and (iii) the development of an Ontology-based Trial Management Application (ObTiMA) that integrates the ACGT-MO into the design process of clinical trials in order to guarantee automatic semantic integration without the need to perform a separate mapping process.