Genomics and Privacy: Implications of the New Reality of Closed Data for the Field

Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.
PLoS Computational Biology (Impact Factor: 4.62). 12/2011; 7(12):e1002278. DOI: 10.1371/journal.pcbi.1002278
Source: PubMed


Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums.

Download full-text


Available from: Andrea Sboner
  • Source
    • "These attacks must be efficiently addressed to avoid a rollback on the trend to share DNA sequences , which would hurt genomic studies, or even harden regulations governing genomic data protection [5]. Detecting privacy-sensitive genomic data as soon as it is generated is a long-term ambition from the research and clinical communities [10] [15]. Recent works on privacypreserving genome processing have been advocating the partitioning of genomic data, but assume this must be done manually [2] or by a tool out of their scope [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic and amino acid sequences. We show that adding our detection method to standard security techniques provides a robust, efficient privacy-preserving solution that neutralizes threats related to recently published attacks on genome privacy based on short tandem repeats, disease-related genes, and genomic variations. Current global knowledge on human genomes demonstrates the feasibility of our approach to obtain a comprehensive database immediately, which can also evolve automatically to address future attacks as new privacy-sensitive sequences are identified. Additionally, we validate that the detection method can be fitted inline with the NGS—Next Generation Sequencing—production cycle by using Bloom filters and scaling out to faster sequencing machines.
    Full-text · Conference Paper · Oct 2015
  • Source
    • "Since the late 1970s, U.S. Department of Veterans Affairs (VA) as a governmental sector advanced their efforts to develop an extensive organizational health information system named veterans’ health information systems and technology architecture (VistA). VistA uses Massachusetts general hospital utility multi-programming system (MUMPS) a program that can be used for disease case registries.[10] Only a few major organizations in the private sector worked on the implementation of EHRs in USA.[11] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many projects on developing Electronic Health Record (EHR) systems have been carried out in many countries. The current study was conducted to review the published data on the utilization of open source EHR systems in different countries all over the world. Using free text and keyword search techniques, six bibliographic databases were searched for related articles. The identified papers were screened and reviewed during a string of stages for the irrelevancy and validity. The findings showed that open source EHRs have been wildly used by source limited regions in all continents, especially in Sub-Saharan Africa and South America. It would create opportunities to improve national healthcare level especially in developing countries with minimal financial resources. Open source technology is a solution to overcome the problems of high-costs and inflexibility associated with the proprietary health information systems.
    Full-text · Article · Jan 2014 · Journal of research in medical sciences
  • Source
    • "Furthermore, personalized medicine has to be based on the careful analysis of multifaceted data. By their very design next-generation-sequencing technologies and other high-throughput methods imply the involvement of many different persons and even external organizations in the data collection and analysis process, with all the corresponding additional risks to patient privacy [2,6]. To retain a semantic reference between patient and sample, while still complying with data privacy requirements, pseudonymization is the method of choice [7,8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The usage of patient data for research poses risks concerning the patients' privacy and informational self-determination. Next-generation-sequencing technologies and various other methods gain data from biospecimen, both for translational research and personalized medicine. If these biospecimen are anonymized, individual research results from genomic research, which should be offered to patients in a clinically relevant timeframe, cannot be associated back to the individual. This raises an ethical concern and challenges the legitimacy of anonymized patient samples. In this paper we present a new approach which supports both data privacy and the possibility to give feedback to patients about their individual research results. We examined previously published privacy concepts regarding a streamlined de-pseudonymization process and a patient-based pseudonym as applicable to research with genomic data and warehousing approaches. All concepts identified in the literature review were compared to each other and analyzed for their applicability to translational research projects. We evaluated how these concepts cope with challenges implicated by personalized medicine. Therefore, both person-centricity issues and a separation of pseudonymization and de-pseudonymization stood out as a central theme in our examination. This motivated us to enhance an existing pseudonymization method regarding a separation of duties. The existing concepts rely on external trusted third parties, making de-pseudonymization a multistage process involving additional interpersonal communication, which might cause critical delays in patient care. Therefore we propose an enhanced method with an asymmetric encryption scheme separating the duties of pseudonymization and de-pseudonymization. The pseudonymization service provider is unable to conclude the patient identifier from the pseudonym, but assigns this ability to an authorized third party (ombudsman) instead. To solve person-centricity issues, a collision-resistant function is incorporated into the method. These two facts combined enable us to address essential challenges in translational research. A productive software prototype was implemented to prove the functionality of the suggested translational, data privacy-preserving method. Eventually, we performed a threat analysis to evaluate potential hazards connected with this pseudonymization method. The proposed method offers sustainable organizational simplification regarding an ethically indicated, but secure and controlled process of de-pseudonymizing patients. A pseudonym is patient-centered to allow correlating separate datasets from one patient. Therefore, this method bridges the gap between bench and bedside in translational research while preserving patient privacy. Assigned ombudsmen are able to de-pseudonymize a patient, if an individual research result is clinically relevant.
    Full-text · Article · Jul 2013 · BMC Medical Informatics and Decision Making
Show more