Article

Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience.

Collaborative Health Studies Coordinating Center, Department of Biostatistics, University of Washington, Seattle, Washington 98115, USA.
Genetic Epidemiology (Impact Factor: 2.95). 04/2011; 35(3):159-73. DOI: 10.1002/gepi.20564
Source: PubMed

ABSTRACT Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (G*E) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.

Download full-text

Full-text

Available from: Peter Kraft, Jul 04, 2015
0 Followers
 · 
132 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The need for comprehensive analysis to compare and combine data across multiple studies in order to validate and extend results is widely recognized. This paper aims to assess the extent of data compatibility in the substance abuse and addiction (SAA) sciences through an examination of measure commonality, defined as the use of similar measures, across grants funded by the National Institute on Drug Abuse (NIDA) and the National Institute on Alcohol Abuse and Alcoholism (NIAAA). Data were extracted from applications of funded, active grants involving human-subjects research in four scientific areas (epidemiology, prevention, services, and treatment) and six frequently assessed scientific domains. A total of 548 distinct measures were cited across 141 randomly sampled applications. Commonality, as assessed by density (range of 0-1) of shared measurement, was examined. Results showed that commonality was low and varied by domain/area. Commonality was most prominent for (1) diagnostic interviews (structured and semi-structured) for substance use disorders and psychopathology (density of 0.88), followed by (2) scales to assess dimensions of substance use problems and disorders (0.70), (3) scales to assess dimensions of affect and psychopathology (0.69), (4) measures of substance use quantity and frequency (0.62), (5) measures of personality traits (0.40), and (6) assessments of cognitive/neurologic ability (0.22). The areas of prevention (density of 0.41) and treatment (0.42) had greater commonality than epidemiology (0.36) and services (0.32). To address the lack of measure commonality, NIDA and its scientific partners recommend and provide common measures for SAA researchers within the PhenX Toolkit.
    Drug and Alcohol Dependence 08/2014; 141. DOI:10.1016/j.drugalcdep.2014.04.029 · 3.28 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The PhenX Toolkit provides researchers with recommended, well-established, low-burden measures suitable for human subject research. The database of Genotypes and Phenotypes (dbGaP) is the data repository for a variety of studies funded by the National Institutes of Health, including genome-wide association studies. The dbGaP requires that investigators provide a data dictionary of study variables as part of the data submission process. Thus, dbGaP is a unique resource that can help investigators identify studies that share the same or similar variables. As a proof of concept, variables from 16 studies deposited in dbGaP were mapped to PhenX measures. Soon, investigators will be able to search dbGaP using PhenX variable identifiers and find comparable and related variables in these 16 studies. To enhance effective data exchange, PhenX measures, protocols, and variables were modeled in Logical Observation Identifiers Names and Codes (LOINC® ). PhenX domains and measures are also represented in the Cancer Data Standards Registry and Repository (caDSR). Associating PhenX measures with existing standards (LOINC® and caDSR) and mapping to dbGaP study variables extends the utility of these measures by revealing new opportunities for cross-study analysis.
    Human Mutation 05/2012; 33(5):849-57. DOI:10.1002/humu.22074 · 5.05 Impact Factor
  • Source
    American Journal of Clinical Nutrition 03/2011; 93(4):681-3. DOI:10.3945/ajcn.111.012641 · 6.92 Impact Factor