PhenX: a toolkit for interdisciplinary genetics research

Cornell University, Ithaca, New York, USA.
Current opinion in lipidology (Impact Factor: 5.8). 04/2010; 21(2):136-40. DOI: 10.1097/MOL.0b013e3283377395
Source: PubMed

ABSTRACT To highlight standard PhenX (consensus measures for Phenotypes and eXposures) measures for nutrition, dietary supplements, and cardiovascular disease research and to demonstrate how these and other PhenX measures can be used to further interdisciplinary genetics research.
PhenX addresses the need for standard measures in large-scale genomic research studies by providing investigators with high-priority, well established, low-burden measurement protocols in a web-based toolkit ( Cardiovascular and Nutrition and Dietary Supplements are just 2 of 21 research domains and accompanying measures included in the PhenX Toolkit.
Genome-wide association studies (GWAS) provide promise for the identification of genomic markers associated with different disease phenotypes, but require replication to validate results. Cross-study comparisons typically increase statistical power and are required to understand the roles of comorbid conditions and environmental factors in the progression of disease. However, the lack of comparable phenotypic, environmental, and risk factor data forces investigators to infer and to compare metadata rather than directly combining data from different studies. PhenX measures provide a common currency for collecting data, thereby greatly facilitating cross-study analysis and increasing statistical power for identification of associations between genotypes, phenotypes, and exposures.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objectives: Classification of data elements (DEs), which is used in clinical documents is challenging, even in across ISO/IEC 11179 compliant clinical metadata registries (MDRs) due to no existence of reliable standard for identifying DEs. We suggest the Clinical Data Element Ontology (CDEO) for unified indexing and retrieval of DEs across MDRs. Methods: The CDEO was developed through harmonization of existing clinical document models and empirical analysis of MDRs. For specific classification as using data element concept (DEC), The Simple Knowledge Organization System was chosen to represent and organize the DECs. Six basic requirements also were set that the CDEO must meet, including indexing target to be a DEC, organizing DECs using their semantic relationships. For evaluation of the CDEO, three indexers mapped 400 DECs to more than 1 CDEO term in order to determine whether the CDEO produces a consistent index to a given DEC. The level of agreementamong the indexers was determined by calculating the intraclass correlation coefficient (ICC). Results: We developed CDEO with 578 concepts. Through two application use-case scenarios, usability of the CDEO is evaluated and it fully met all of the considered requirements. The ICC among the three indexers was estimated to be 0.59 (95% confidence interval, 0.52-0.66). Conclusions: The CDEO organizes DECs originating from different MDRs into a single unified conceptual structure. It enables highly selective search and retrieval of relevant DEs from multiple MDRs for clinical documentation and clinical research data aggregation.
    10/2014; 20(4):295-303. DOI:10.4258/hir.2014.20.4.295
  • [Show abstract] [Hide abstract]
    ABSTRACT: Lack of standardization in representing phenotype data generated in different studies is a major barrier to data reuse for cross study analyses. To address this issue, we developed DIVER, a tool that identifies and standardizes demographic variables in dbGaP, based on simple natural language processing and standardized terminology mapping. In its evaluation using variables (N=3,565) from a range of pulmonary studies in dbGaP, DIVER proved to be an effective approach to standardizing dbGaP variables by successfully identifying demographic variables with high rates of recall and precision (98% and 94%, respectively). In addition, DIVER correctly modeled 79% of the identified demographic variables at the core semantic level. Examination of variables that DIVER could not handle shed light on where our tool needs enhancement so it can further improve its semantic modeling accuracy. DIVER is an important component of a system for phenotype discovery in dbGaP studies.
    Healthcare Informatics, Imaging and Systems Biology (HISB), 2012 IEEE Second International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Type 2 diabetes (T2D) is thought to arise from an interaction between susceptibility genes and a diabetogenic environment. This review summarizes progress pertaining specifically to gene-diet interactions. Recent efforts have been population-based and have focused on established genetic and dietary risk factors for T2D. TCF7L2 × carbohydrate-quality and IRS1 × macronutrient-composition interactions are promising factors, but most studies of gene-diet interactions are conflicting or need follow-up. T2D genetic risk scores are powerful predictors of developing T2D, but whether they can be combined with dietary risk factors merits further study. Lack of statistical power, imprecise diet measures, and conceptual issues surrounding replication all challenge our efforts to characterize interactions. Collaborations are needed for optimal study designs in both hypothesis-testing and hypothesis-generating contexts. Continued investment in studies of gene-diet interactions may lead to novel mechanistic insights into T2D, opportunities for risk stratification, and ultimately to personalized nutrition to prevent the disease.
    12/2014; 3(4). DOI:10.1007/s13668-014-0095-1