Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients

Article (PDF Available)inNature Communications 5:4022 · June 2014with58 Reads
DOI: 10.1038/ncomms5022 · Source: PubMed
A key prerequisite for precision medicine is the estimation of disease progression from the current patient state. Disease correlations and temporal disease progression (trajectories) have mainly been analysed with focus on a small number of diseases or using large-scale approaches without time consideration, exceeding a few years. So far, no large-scale studies have focused on defining a comprehensive set of disease trajectories. Here we present a discovery-driven analysis of temporal disease progression patterns using data from an electronic health registry covering the whole population of Denmark. We use the entire spectrum of diseases and convert 14.9 years of registry data on 6.2 million patients into 1,171 significant trajectories. We group these into patterns centred on a small number of key diagnoses such as chronic obstructive pulmonary disease (COPD) and gout, which are central to disease progression and hence important to diagnose early to mitigate the risk of adverse outcomes. We suggest such trajectory analyses may be useful for predicting and preventing future diseases of individual patients.

Full-text (PDF)

Available from: Anders Boeck Jensen, Mar 15, 2015
    • "Disease similarity study has been attracting more and more attention in recent years, because understanding how the diseases are related to each other not only provides new insights into disease taxonomy, etiology, but also helps to perform drug repositioning and drug target identification [1]. Previous studies have been able to explore disease similarity, i.e., disease relationship, from clinical manifestations [2][3][4][5][6], electronic medical records [7][8][9][10], disease-related genes [1, 11], miRNA [12] , pro- teins [13] or pathways [14], chemical fragments [15], microbiota [16], disease-related metabolic reactions [17], disease-related differentially expressed genes [18][19][20]and multi-types of data [9, 19, 21]. Accordingly, there have been several disease relationship databases, such as 1) the Human Phenotype Ontology (HPO) which shows phenotypic similarities of diseases based on shared clinical synopsis features extracted from OMIM [22]; 2) the Comparative Toxicogenomics Database (CTD) which includes a 'DiseaseComps' section to show similar disorders via a) chemical or gene connections, and b) marker/mechanism or therapeutic associations [23]; 3) Malacards which presents disease relationships based on similar medical vocabulary concepts and common disease-related genes [24]; 4) DisGeNET which includes gene-disease associations and disease-disease associations ,they evaluate disease-disease associations on shared genes [25]. "
    [Show abstract] [Hide abstract] ABSTRACT: Disease similarity study provides new insights into disease taxonomy, pathogenesis, which plays a guiding role in diagnosis and treatment. The early studies were limited to estimate disease similarities based on clinical manifestations, disease-related genes, medical vocabulary concepts or registry data, which were inevitably biased to well-studied diseases and offered small chance of discovering novel findings in disease relationships. In other words, genome-scale expression data give us another angle to address this problem since simultaneous measurement of the expression of thousands of genes allows for the exploration of gene transcriptional regulation, which is believed to be crucial to biological functions. Although differential expression analysis based methods have the potential to explore new disease relationships, it is difficult to unravel the upstream dysregulation mechanisms of diseases. We therefore estimated disease similarities based on gene expression data by using differential coexpression analysis, a recently emerging method, which has been proved to be more potential to capture dysfunctional regulation mechanisms than differential expression analysis. A total of 1,326 disease relationships among 108 diseases were identified, and the relevant information constituted the human disease network database (DNetDB). Benefiting from the use of differential coexpression analysis, the potential common dysfunctional regulation mechanisms shared by disease pairs (i.e. disease relationships) were extracted and presented. Statistical indicators, common disease-related genes and drugs shared by disease pairs were also included in DNetDB. In total, 1,326 disease relationships among 108 diseases, 5,598 pathways, 7,357 disease-related genes and 342 disease drugs are recorded in DNetDB, among which 3,762 genes and 148 drugs are shared by at least two diseases. DNetDB is the first database focusing on disease similarity from the viewpoint of gene regulation mechanism. It provides an easy-to-use web interface to search and browse the disease relationships and thus helps to systematically investigate etiology and pathogenesis, perform drug repositioning, and design novel therapeutic interventions. Database URL: http://app.scbit.org/DNetDB/#. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0280-5) contains supplementary material, which is available to authorized users.
    Full-text · Article · Dec 2016
    • "More importantly, our results are in line with findings from related studies. In particular, our findings for the disease temporal patters in the EA cohort are consistent with the disease trajectories identified by Jensen et al. (2014) . Direct comparison of results between our study and theirs is difficult, however, namely due to use of non-identical statistical methodologies and ontological disease mappings (ICD-9 versus ICD-10). "
    [Show abstract] [Hide abstract] ABSTRACT: Motivation: Underrepresentation of racial groups represents an important challenge and major gap in phenomics research. Most of the current human phenomics research is based primarily on European populations; hence it is an important challenge to expand it to consider other population groups. One approach is to utilize data from EMR databases that contain patient data from diverse demographics and ancestries. The implications of this racial underrepresentation of data can be profound regarding effects on the healthcare delivery and actionability. To the best of our knowledge, our work is the first attempt to perform comparative, population-scale analyses of disease networks across three different populations, namely Caucasian (EA), African American (AA) and Hispanic/Latino (HL). Results: We compared susceptibility profiles and temporal connectivity patterns for 1988 diseases and 37 282 disease pairs represented in a clinical population of 1 025 573 patients. Accordingly, we revealed appreciable differences in disease susceptibility, temporal patterns, network structure and underlying disease connections between EA, AA and HL populations. We found 2158 significantly comorbid diseases for the EA cohort, 3265 for AA and 672 for HL. We further outlined key disease pair associations unique to each population as well as categorical enrichments of these pairs. Finally, we identified 51 key ‘hub’ diseases that are the focal points in the race-centric networks and of particular clinical importance. Incorporating race-specific disease comorbidity patterns will produce a more accurate and complete picture of the disease landscape overall and could support more precise understanding of disease relationships and patient management towards improved clinical outcomes. Contacts: rong.chen@mssm.edu or joel.dudley@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Full-text · Article · Jun 2016
    • "The predicted trajectories cover most of the comorbidities in the literature. More importantly, different from previous research [6, 7], which predicts incidence or visualizes temporal trajectory patterns, our tree-based model predicts the confidence of the future trajectory and reveals every possible path between any two medical conditions. For example, for Eating Disorder (ED) and Panic Disorder (PD), there are at least two paths progressing from ED to PD (EDTobacco AddictionDrug AddictionPD and EDObsessive– Compulsive DisorderPD). "
    [Show abstract] [Hide abstract] ABSTRACT: Many patients suffer from comorbidity conditions, for example, obese patients often develop type-2 diabetes and hypertension. In the US, 80% of Medicare spending is for managing patients with these multiple coexisting conditions. Predicting potential comorbidity conditions for an individual patient can promote preventive care and reduce costs. Predicting possible comorbidity progression paths can provide important insights into population heath and aid with decisions in public health policies. Discovering the comorbidity relationships is complex and difficult, due to limited access to Electronic Health Records by privacy laws. In this paper, we present a collaborative comorbidity prediction method to predict likely comorbid conditions for individual patients, and a trajectory prediction graph model to reveal progression paths of comorbid conditions. Our prediction approaches utilize patient generated health reports on online social media, called Social Health Records (SHR). The experimental results based on one SHR source show that our method is able to predict future comorbid conditions for a patient with coverage values of 48% and 75% for a top-20 and a top-100 ranked list, respectively. For risk trajectory prediction, our approach is able to reveal each potential progression trajectory between any two conditions and infer the confidence of the future trajectory, given any observed condition. The predicted trajectories are validated with existing comorbidity relations from the medical literature.
    Full-text · Article · May 2016
Show more