Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients

Nature Communications (Impact Factor: 11.47). 06/2014; 5:4022. DOI: 10.1038/ncomms5022
Source: PubMed


A key prerequisite for precision medicine is the estimation of disease progression from the current patient state. Disease correlations and temporal disease progression (trajectories) have mainly been analysed with focus on a small number of diseases or using large-scale approaches without time consideration, exceeding a few years. So far, no large-scale studies have focused on defining a comprehensive set of disease trajectories. Here we present a discovery-driven analysis of temporal disease progression patterns using data from an electronic health registry covering the whole population of Denmark. We use the entire spectrum of diseases and convert 14.9 years of registry data on 6.2 million patients into 1,171 significant trajectories. We group these into patterns centred on a small number of key diagnoses such as chronic obstructive pulmonary disease (COPD) and gout, which are central to disease progression and hence important to diagnose early to mitigate the risk of adverse outcomes. We suggest such trajectory analyses may be useful for predicting and preventing future diseases of individual patients.

37 Reads
  • Source
    • "It is difficult to find the trajectory of conditions that may be correlated. Research has been carried out on predicting incidence [2] [3] [4] [5] and progression trajectories [6] [7] from EHR data. However, EHR datasets are usually limited to one medical site or network and have limited coverage of population and time period. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the U.S., 80% of Medicare spending is for managing patients with multiple coexisting conditions. Predicting potentially correlated diseases for an individual patient and correlated disease progression paths are both important research tasks. For example, obese patients are at an increased risk for developing type-2 diabetes and hypertension. This correlation is called comorbidity relationship. Discovering the comorbidity relationships is complex and difficult due to the limited access to Electronic Health Records by privacy laws. In this paper, we present a framework called Social Data-based Prediction of Incidence and Trajectory to predict potential risks for medical conditions as well as its progression trajectory to identify the comorbidity path. The framework utilizes patients’ publicly available social media data and presents a collaborative prediction model to predict the ranked list of potential comorbidity incidences, and a trajectory prediction model to reveal different paths of condition progression. The experimental results show that our framework is able to predict future conditions for online patients with a coverage value of 48% and 75% for a top-20 and a top-100 ranked list, respectively. For risk trajectory prediction, our framework is able to reveal each potential progression trajectory between any two conditions and infer the confidence of the future trajectory, given any observed condition. The predicted trajectories are validated with existing comorbidity relations from the medical literature.
    2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington D.C.; 11/2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple diseases (acute or chronic events) occur together in a patient, which refers to the disease comorbidities, because of the multi ways associations among diseases. Due to shared genetic, molecular, environmental, and lifestyle-based risk factors, many diseases are comorbid in the same patient. Methods for integrating multiple types of omics data play an important role to identify integrative biomarkers for stratification of patients into groups with different clinical outcomes. Moreover, integrated omics and clinical information may potentially improve prediction accuracy of disease comorbidities. However, there is a lack of effective and efficient bioinformatics and statistical software for true integrative data analysis. With the availability of the wide spread huge omics, phenotype and ontology information, it is becoming more and more practical to help doctors in clinical diagnostics and comorbidity prediction by providing appropriate software tool. We developed an R software POGO to compute novel estimators of the disease comorbidity risks and patient stratification. Starting from an initial diagnosis, omics and clinical data of a patient the software identifies the association risk of disease comorbidities. The input of this software is the initial diagnosis of a patient and the output provides evidence of disease comorbidities. The functions of POGO offer flexibility for diagnostic applications to predict disease comorbidities, and can be easily integrated to high-throughput and clinical data analysis pipelines. POGO is compliant with the Bioconductor standard and it is freely available at www.cl.cam.ac.uk/~mam211/POGO/.
    Frontiers in Cell and Developmental Biology 06/2015; 3:28. DOI:10.3389/fcell.2015.00028
  • [Show abstract] [Hide abstract]
    ABSTRACT: Health care research focuses on the description and analysis of the health care system and its requirements. Research-derived innovations are the subject of trials and evaluation of the transfer to daily routine. For this purpose health care research has developed a broad theory-based spectrum of methods. On the other hand, the concept of big data is an new informatics-driven approach to large data sets independent of content. With its technical vocabulary the concept of big data does not easily fit into traditional health care research. Central tasks of health care research such as the generation of theories, norm-oriented evaluations or proof of causality can neither be supported nor replaced by big data. However, the concept of big data has the potential to support health care research, with traditional tasks such as data linkage, analysis of health care paths, quick access to up-to-date data on the distribution and acceptance of health care services, as well as prediction and the generation of hypotheses. The prerequisite for all this is a trust-based linkage of different medical and nonmedical data sources on the basis of the legal regulation of data access and data protection.
    Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz 06/2015; DOI:10.1007/s00103-015-2183-9 · 1.42 Impact Factor
Show more

Full-text (3 Sources)

37 Reads
Available from
Mar 15, 2015