Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data

Journal of the American Medical Informatics Association (Impact Factor: 3.93). 06/2012; 20(1). DOI: 10.1136/amiajnl-2012-000936
Source: PubMed

ABSTRACT OBJECTIVES: The aim of this paper is to summarize concerns with the de-identification standard and methodologies established under the Health Insurance Portability and Accountability Act (HIPAA) regulations, and report some potential policies to address those concerns that were discussed at a recent workshop attended by industry, consumer, academic and research stakeholders. TARGET AUDIENCE: The target audience includes researchers, industry stakeholders, policy makers and consumer advocates concerned about preserving the ability to use HIPAA de-identified data for a range of important secondary uses. SCOPE: HIPAA sets forth methodologies for de-identifying health data; once such data are de-identified, they are no longer subject to HIPAA regulations and can be used for any purpose. Concerns have been raised about the sufficiency of HIPAA de-identification methodologies, the lack of legal accountability for unauthorized re-identification of de-identified data, and insufficient public transparency about de-identified data uses. Although there is little published evidence of the re-identification of properly de-identified datasets, such concerns appear to be increasing. This article discusses policy proposals intended to address de-identification concerns while maintaining de-identification as an effective tool for protecting privacy and preserving the ability to leverage health data for secondary purposes.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe functional specifications and practicalities in the software development process for a web service that allows the construction of the multivariate logistic regression model, Grid Logistic Regression (GLORE), by aggregating partial estimates from distributed sites, with no exchange of patient-level data. We recently developed and published a web service for model construction and data analysis in a distributed environment. This recent paper provided an overview of the system that is useful for users, but included very few details that are relevant for biomedical informatics developers or network security personnel who may be interested in implementing this or similar systems. We focus here on how the system was conceived and implemented. We followed a two-stage development approach by first implementing the backbone system and incrementally improving the user experience through interactions with potential users during the development. Our system went through various stages such as concept proof, algorithm validation, user interface development, and system testing. We used the Zoho Project management system to track tasks and milestones. We leveraged Google Code and Apache Subversion to share code among team members, and developed an applet-servlet architecture to support the cross platform deployment. During the development process, we encountered challenges such as Information Technology (IT) infrastructure gaps and limited team experience in user-interface design. We figured out solutions as well as enabling factors to support the translation of an innovative privacy-preserving, distributed modeling technology into a working prototype. Using GLORE (a distributed model that we developed earlier) as a pilot example, we demonstrated the feasibility of building and integrating distributed modeling technology into a usable framework that can support privacy-preserving, distributed data analysis among researchers at geographically dispersed institutes.
    12/2014; 2(1):1053. DOI:10.13063/2327-9214.1053
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide methylation arrays are increasingly used tools in studies of complex medical disorders. Because of their expense and potential utility to the scientific community, current federal policy dictates that data from these arrays, like those from genome-wide genotyping arrays, be deposited in publicly available databases. Unlike the genotyping information, access to the expression data is not restricted. An underlying supposition in the current nonrestricted access to methylation data is the belief that protected health and personal identifying information cannot be simultaneously extracted from these arrays. In this communication, we analyze methylation data from the Illumina HumanMethylation450 array and show that genotype at 1,069 highly informative loci, and both alcohol and smoking consumption information, can be derived from the array data. We conclude that both potentially personally identifying information and substance-use histories can be simultaneously derived from methylation array data. Because access to genetic information about a database subject or one of their relatives is critical to the de-identification process, this risk of de-identification is limited at the current time. We propose that access to genome-wide methylation data be restricted to institutionally approved investigators who accede to data use agreements prohibiting re-identification.
    11/2014; 6(1):28. DOI:10.1186/1868-7083-6-28
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research.
    Journal of Korean Medical Science 01/2015; 30(1):7-15. DOI:10.3346/jkms.2015.30.1.7 · 1.25 Impact Factor

Preview (2 Sources)

1 Download
Available from