Article

Visual cluster analysis in support of clinical decision intelligence.

IBM T.J. Watson Research Center, New York, USA.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2011; 2011:481-90.
Source: PubMed

ABSTRACT Electronic health records (EHRs) contain a wealth of information about patients. In addition to providing efficient and accurate records for individual patients, large databases of EHRs contain valuable information about overall patient populations. While statistical insights describing an overall population are beneficial, they are often not specific enough to use as the basis for individualized patient-centric decisions. To address this challenge, we describe an approach based on patient similarity which analyzes an EHR database to extract a cohort of patient records most similar to a specific target patient. Clusters of similar patients are then visualized to allow interactive visual refinement by human experts. Statistics are then extracted from the refined patient clusters and displayed to users. The statistical insights taken from these refined clusters provide personalized guidance for complex decisions. This paper focuses on the cluster refinement stage where an expert user must interactively (a) judge the quality and contents of automatically generated similar patient clusters, and (b) refine the clusters based on his/her expertise. We describe the DICON visualization tool which allows users to interactively view and refine multidimensional similar patient clusters. We also present results from a preliminary evaluation where two medical doctors provided feedback on our approach.

0 Bookmarks
 · 
94 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We previously described a methodology for converting a large set of confidential data records into a set of summaries of similar patients. They claimed that the resulting patient types could "capture important trends and patterns in the data set without disclosing the information in any of the individual data records." In this paper we examine the predictive validity of an initial set of patient types developed in our earlier research. We ask the following question: To what extent can the summarized data derived from each cluster (patient type) be as informative as the original case level data (individuals) from which the clusters were inferred? We address this question by assessing how well predictions made with summarized data matched predictions made with original data. After reviewing relevant literature, and explaining how data is summarized in each cluster of similar patients, we compare the results of predicting death in the ICU 1 using both summarized (regression analysis) and original case data (discriminant analysis and logistic regression analysis). When multiple clusters were used, prediction based on regression analysis of the summarized data was found to be better than prediction using either logistic regression or discriminant analysis on the raw data. We hypothesize that this result is due to segmentation of a heterogenous multivariate space into more homogeneous subregions. We see the present results as an important step towards the development of generalized health data search engines that can utilize non-confidential summarized data passed through health data repository firewalls.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chignell et al. [1] previously described a methodology for converting a large set of confidential data records into a set of summaries of similar patients. They claimed that the resulting patient types could "capture important trends and patterns in the data set without disclosing the information in any of the individual data records." In this paper we examine the predictive validity of an initial set of patient types developed by [1]. We ask the following question: To what extent can the summarized data derived from each cluster (patient type) be as informative as the original case level data (individuals) from which the clusters were inferred? We address this question by assessing how well predictions made with summarized data matched predictions made with original data. After reviewing relevant literature, and explaining how data is summarized in each cluster of similar patients, we compare the results of predicting death in the ICU 1 using both summarized (regression analysis) and original case data (discriminant analysis and logistic regression analysis). When multiple clusters were used, prediction based on regression analysis of the summarized data was found to be better than prediction using either logistic regression or discriminant analysis on the raw data. We hypothesize that this result is due to segmentation of a heterogenous multivariate space into more homogeneous subregions. We see the present results as an important step towards the development of generalized health data search engines that can utilize non-confidential summarized data passed through health data repository firewalls.
  • Journal of comparative effectiveness research. 11/2013; 2(6):529-32.

Full-text

View
0 Downloads
Available from