SHARE: system design and case studies for statistical health information release

Digital Reasoning Systems Inc, Franklin, Tennessee, USA.
Journal of the American Medical Informatics Association (Impact Factor: 3.5). 10/2012; 20(1). DOI: 10.1136/amiajnl-2012-001032
Source: PubMed


Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data.
Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE.
Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data.
Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.

6 Reads
  • Source
    Journal of the American Medical Informatics Association 12/2012; 20(1). DOI:10.1136/amiajnl-2012-001509 · 3.50 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: form only given. Current information technology enables many organizations to collect, store, and use massive amount and various types of information about individuals. While sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier. Differential privacy is widely accepted as one of the strongest privacy guarantees. While many effective mechanisms have been proposed for specific data mining applications, non-interactive data release to support exploratory data analysis with differential privacy remains an open problem. I will present our Adaptive Differentially Private Data Release (ADP) project which aims to build a suite of data-driven and adaptive techniques for differentially private data release by exploiting the characteristics of the underlying data. I will present our ongoing work on techniques for handling different types of data including relational, high dimensional, transactional, sequential, and time series data. I will present case studies using real datasets demonstrating the feasibility of using the released data for various data mining tasks such as classification and frequent pattern mining. Finally, I will discuss the challenges and open questions of applying the differential privacy framework for general data sharing.
    2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW); 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There is currently limited information on best practices for the development of governance requirements for distributed research networks (DRNs), an emerging model that promotes clinical data reuse and improves timeliness of comparative effectiveness research. Much of the existing information is based on a single type of stakeholder such as researchers or administrators. This paper reports on a triangulated approach to developing DRN data governance requirements based on a combination of policy analysis with experts, interviews with institutional leaders, and patient focus groups. This approach is illustrated with an example from the Scalable National Network for Effectiveness Research, which resulted in 91 requirements. These requirements were analyzed against the Fair Information Practice Principles (FIPPs) and Health Insurance Portability and Accountability Act (HIPAA) protected versus non-protected health information. The requirements addressed all FIPPs, showing how a DRN's technical infrastructure is able to fulfill HIPAA regulations, protect privacy, and provide a trustworthy platform for research.
    Journal of the American Medical Informatics Association 12/2013; 21(4). DOI:10.1136/amiajnl-2013-002308 · 3.50 Impact Factor
Show more


6 Reads