SHARE: system design and case studies for statistical health information release

Digital Reasoning Systems Inc, Franklin, Tennessee, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.5). 10/2012; 20(1). DOI: 10.1136/amiajnl-2012-001032
Source: PubMed


Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data.
Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE.
Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data.
Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.

Download full-text

Full-text preview

Available from:
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Injury is a leading cause of morbidity and mortality worldwide, and even more so in low and middle-income countries (LMICs). Iran is a LMIC and lacks information regarding injury for program and policy purposes. This study aimed to describe the incidence and patterns of injury in one province in South Eastern Iran. Methods A hospital-based, retrospective case review using a routinely collected registry in all Emergency Departments in Sistan and Baluchistan province, Iran for 12 months in 2007–2008. Results In total 18,155 injuries were recorded during the study period. The majority of injuries in South Eastern Iran were due to road traffic crashes. Individuals living in urban areas sustained more injuries compared to individuals from rural areas. Males typically experienced more injuries than females. Males were most likely to be injured in a street/alley or village whereas females were most likely to be injured in or around the home. In urban areas, road traffic related injuries were observed to affect older age groups more than younger age groups. Poisoning was most common in the youngest age group, 0 to 4 years. Conclusions This study provides data on incidence and patterns of injury in South Eastern Iran. Knowledge of injury burden, such as this paper, is likely to help policy makers and planners with health service planning and injury prevention.
    BMC International Health and Human Rights 09/2012; 12(1):17. DOI:10.1186/1472-698X-12-17 · 1.44 Impact Factor
  • Source

    Journal of the American Medical Informatics Association 12/2012; 20(1). DOI:10.1136/amiajnl-2012-001509 · 3.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: form only given. Current information technology enables many organizations to collect, store, and use massive amount and various types of information about individuals. While sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier. Differential privacy is widely accepted as one of the strongest privacy guarantees. While many effective mechanisms have been proposed for specific data mining applications, non-interactive data release to support exploratory data analysis with differential privacy remains an open problem. I will present our Adaptive Differentially Private Data Release (ADP) project which aims to build a suite of data-driven and adaptive techniques for differentially private data release by exploiting the characteristics of the underlying data. I will present our ongoing work on techniques for handling different types of data including relational, high dimensional, transactional, sequential, and time series data. I will present case studies using real datasets demonstrating the feasibility of using the released data for various data mining tasks such as classification and frequent pattern mining. Finally, I will discuss the challenges and open questions of applying the differential privacy framework for general data sharing.
    2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW); 12/2013
Show more