Building a Biomedical Cyberinfrastructure for Collaborative Research

RTI International, 3040 Cornwallis Road, Research Triangle Park, NC 27709, USA.
American journal of preventive medicine (Impact Factor: 4.53). 05/2011; 40(5 Suppl 2):S144-50. DOI: 10.1016/j.amepre.2011.01.018
Source: PubMed


For the potential power of genome-wide association studies (GWAS) and translational medicine to be realized, the biomedical research community must adopt standard measures, vocabularies, and systems to establish an extensible biomedical cyberinfrastructure. Incorporating standard measures will greatly facilitate combining and comparing studies via meta-analysis. Incorporating consensus-based and well-established measures into various studies should reduce the variability across studies due to attributes of measurement, making findings across studies more comparable. This article describes two well-established consensus-based approaches to identifying standard measures and systems: PhenX (consensus measures for phenotypes and eXposures), and the Open Geospatial Consortium (OGC). NIH support for these efforts has produced the PhenX Toolkit, an assembled catalog of standard measures for use in GWAS and other large-scale genomic research efforts, and the RTI Spatial Impact Factor Database (SIFD), a comprehensive repository of geo-referenced variables and extensive meta-data that conforms to OGC standards. The need for coordinated development of cyberinfrastructure to support measures and systems that enhance collaboration and data interoperability is clear; this paper includes a discussion of standard protocols for ensuring data compatibility and interoperability. Adopting a cyberinfrastructure that includes standard measures and vocabularies, and open-source systems architecture, such as the two well-established systems discussed here, will enhance the potential of future biomedical and translational research. Establishing and maintaining the cyberinfrastructure will require a fundamental change in the way researchers think about study design, collaboration, and data storage and analysis.

5 Reads
  • Source
    • "The benefits of harmonizing and pooling research databases are numerous. Integrating harmonized data from different populations allows achieving sample sizes that could not be obtained with individual studies [1-4], improves the generalizability of results [3-5], helps ensure the validity of comparative research [6,7], encourages more efficient secondary usage of existing data [8], and provides opportunities for collaborative and multi-centre research [9-12]. Governments, funders, and researchers alike have been stressing the importance of harmonization and collaborative use of data and samples in the population health and biobanking fields over the past half-decade [13-21]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study's questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein.
    Emerging Themes in Epidemiology 11/2013; 10(1):12. DOI:10.1186/1742-7622-10-12 · 2.59 Impact Factor

  • American journal of preventive medicine 05/2011; 40(5 Suppl 2):S245-8. · 4.53 Impact Factor
  • Source

    American journal of preventive medicine 05/2011; 40(5 Suppl 2):S97-102. DOI:10.1016/j.amepre.2011.01.006 · 4.53 Impact Factor
Show more