The aim of translational cancer research is to transfer biomedical discoveries from the
bench to the bedside. This is a challenging goal, since each cancer is a complex system
with individual molecular features which determine the actual dynamics of the disease,
such as its prognosis and response to therapies. Understanding those biological traits
is fundamental in order to harness them in advance to improve clinical care and preci-
sion medicine. To achieve this, pre-clinical research procedures have been developed,
which deploy large-scale experiments involving serial propagation of patients’ samples
through in vivo and in vitro cultures [1, 2]. By preserving the fundamental biolog-
ical properties of the collected material (e.g., sensitivity to specific therapies), such
approaches allow challenging the tumor material with different perturbations (e.g.,
drugs) and measuring how it responds (e.g., tumor shrinkage). These processes gen-
erate massive collections of hierarchical data (i.e., experimental trees) which may be
annotated with heterogeneous notes based on experimental results and observations,
thus creating huge datasets that are extremely difficult to analyze both by humans and
by machines. To address such issues in data analysis, we created the Semalytics data
framework, the core of an analytical platform that processes experimental information
through Semantic Web technologies. The platform enables users to bind experimental
data to knowledge items (i.e., metadata describing biological properties) and to inves-
tigate such annotations. Semalytics allows (i) the efficient exploration of experimental
trees of undefined depth together with their annotations. Moreover, (ii) the platform
links its data to a wider open knowledge base (i.e., Wikidata) for adding an extended
knowledge layer without the need to manage and curate those data locally. Alto-
gether, Semalytics provides an augmented perspective on experimental data, allowing
the generation of new hypotheses, which were not anticipated by the user a priori.
In this thesis, we present our research on the data framework of Semalytics, focusing
on its semantic nucleus and on how it exploits semantic reasoning to tackle issues of
this kind of analyses. Finally, we describe a proof-of-concept study based on the exam-
ination of several dozen cases of metastatic colorectal cancer in order to illustrate how
Semalytics can help researchers generate hypotheses about the role of genes alterations
in causing resistance or sensitivity of cancer cells to specific drugs.
In the last years, Laboratory Information Management Systems (LIMS) have been growing from mere inventory systems into increasingly comprehensive software platforms, spanning functionalities as diverse as data search, annotation and analysis. Our institution started in 2011 a LIMS project named the Laboratory Assistant Suite with the purpose of assisting researchers throughout all of their laboratory activities, providing graphical tools to support decision-making tasks and building complex analyses on integrated data. The modular architecture of the system exploits multiple databases with different technologies. To provide an efficient and easy tool for retrieving information of interest, we developed the Multi-Dimensional Data Manager (MDDM). By means of intuitive interfaces, scientists can execute complex queries without any knowledge of query languages or database structures, and easily integrate heterogeneous data stored in multiple databases. Together with the other software modules making up the platform, the MDDM has helped improve the overall quality of the data, substantially reduced the time spent with manual data entry and retrieval and ultimately broadened the spectrum of interconnections among the data, offering novel perspectives to the biomedical analysts.
Handling large knowledge bases of information from different domains such as the World Wide Web is a complex problem addressed in the Resource Description Framework (RDF) by adding semantic meaning to the data itself. The amount of linked data has brought with it a number of specialized databases that are capable of storing and processing RDF data, called RDF stores. We explore the RDF store landscape with the aim of finding an RDF store that sufficiently meets the storage needs of an enhanced living environment, more concretely the requirements of a Smart Space platform aimed at running on a cluster set up of low-power hardware that can be run locally entirely at home with the purpose of logging data for a reactive assistive system involving, e.g., activity recognition or domotics. We present a literature analysis of RDF stores and identify promising candidates for implementation of consumer Smart Spaces. Based on the insights provided with our study, we conclude by suggesting different relevant aspects of RDF storage systems that need to be considered in Ambient Assisted Living environments and a comparison of available solutions.
Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patients’ tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards for describing model characteristics presents a significant challenge to identifying PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration. The PDX Finder resource currently contains information for 1985 PDX models of diverse cancers including those from large resources such as the Patient-Derived Models Repository, PDXNet and EurOPDX. Individuals or organizations that generate and distribute PDXs are invited to increase the ‘findability’ of their models by participating in the PDX Finder initiative at www.pdxfinder.org.
Heart disease has been the leading cause of death in the United States since 1910 and cancer the second leading cause of death since 1933. However, cancer emerged recently as the leading cause of death in many US states. The objective of this study was to provide an in-depth analysis of age-standardized annual state-specific mortality rates for heart disease and cancer.
We used population-based mortality data from 1999 through 2016 to compare 2 underlying cause-of-death categories: diseases of heart (International Classification of Diseases, 10th Revision [ICD-10] codes I00-I09, I11, I13, and I20-I51) and malignant neoplasms (ICD-10 codes C00-C97). We calculated age-standardized annual state-specific mortality rate ratios (MRRs) as heart disease mortality rate divided by cancer mortality rate.
In 1999, age-standardized heart disease mortality exceeded that for cancer in all 50 states. Median state-specific MRR in 1999 was 1.26 (interquartile range [IQR], 1.17-1.34; range, 1.03-1.56), indicating predominance of heart disease mortality nationwide. Median state-specific MRR decreased annually through 2010, reaching a low of 1.00 (IQR, 0.95-1.07; range, 0.71-1.25), indicating that predominance of heart disease mortality prevailed in approximately half of states. Median state-specific MRR increased to 1.03 (IQR, 0.97-1.12; range, 0.77-1.31) in 2016. In 2016, age-standardized cancer mortality exceeded that for heart disease in 19 states. State-level transitions were most apparent for people aged 65 to 84 and affected men, women, and all racial/ethnic groups.
State-level data indicated heterogeneity across US states in the predominance of heart disease mortality relative to cancer mortality. Timing and magnitude of transitions toward cancer mortality predominance varied by state.
Precision oncology relies on the accurate discovery and interpretation of genomic variants to enable individualized therapy selection, diagnosis, or prognosis. However, knowledgebases containing clinical interpretations of somatic cancer variants are highly disparate in interpretation content, structure, and supporting primary literature, reducing consistency and impeding consensus when evaluating variants and their relevance in a clinical setting. With the cooperation of experts of the Global Alliance for Genomics and Health (GA4GH) and of six prominent cancer variant knowledgebases, we developed a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. We demonstrated large gains in overlapping terms between resources across variants, diseases, and drugs as a result of this harmonization. We subsequently demonstrated improved matching between patients of the GENIE cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 34% to 57% in aggregate. We developed an open and freely available web interface for exploring the harmonized interpretations from these six knowledgebases at search.cancervariants.org.
In order to achieve more accurate disease prevention, diagnosis, and
treatment, clinical and genetic data need extensive and systematically associated
study. As one way to achieve precision medicine, a laboratory information management
system (LIMS) can effectively associate clinical data in a macrocosmic
aspect and genomic data in a microcosmic aspect. This chapter summarizes the
application of the LIMS in a clinical data management and implementation mode. It
also discusses the principles of a LIMS in clinical data management, as well as the
opportunities and challenges in the context of medical informatics.
While tumor genome sequencing has become widely available in clinical and research settings, the interpretation of tumor somatic variants remains an important bottleneck. Here we present the Cancer Genome Interpreter, a versatile platform that automates the interpretation of newly sequenced cancer genomes, annotating the potential of alterations detected in tumors to act as drivers and their possible effect on treatment response. The results are organized in different levels of evidence according to current knowledge, which we envision can support a broad range of oncology use cases. The resource is publicly available at http://www.cancergenomeinterpreter.org.
Electronic supplementary material
The online version of this article (10.1186/s13073-018-0531-8) contains supplementary material, which is available to authorized users.
With prospective clinical sequencing of tumors emerging as a mainstay in cancer care, an urgent need exists for a clinical support tool that distills the clinical implications associated with specific mutation events into a standardized and easily interpretable format. To this end, we developed OncoKB, an expert-guided precision oncology knowledge base.
OncoKB annotates the biologic and oncogenic effects and prognostic and predictive significance of somatic molecular alterations. Potential treatment implications are stratified by the level of evidence that a specific molecular alteration is predictive of drug response on the basis of US Food and Drug Administration labeling, National Comprehensive Cancer Network guidelines, disease-focused expert group recommendations, and scientific literature.
To date, > 3,000 unique mutations, fusions, and copy number alterations in 418 cancer-associated genes have been annotated. To test the utility of OncoKB, we annotated all genomic events in 5,983 primary tumor samples in 19 cancer types. Forty-one percent of samples harbored at least one potentially actionable alteration, of which 7.5% were predictive of clinical benefit from a standard treatment. OncoKB annotations are available through a public Web resource ( http://oncokb.org ) and are incorporated into the cBioPortal for Cancer Genomics to facilitate the interpretation of genomic alterations by physicians and researchers.
OncoKB, a comprehensive and curated precision oncology knowledge base, offers oncologists detailed, evidence-based information about individual somatic mutations and structural alterations present in patient tumors with the goal of supporting optimal treatment decisions.