Bioinformatics: Biomarkers of early detection
ABSTRACT Capturing, sharing, and publishing cancer biomarker research data are all fundamental challenges of enabling new opportunities to research and understand scientific data. Informatics experts from the National Cancer Institute's (NCI) Early Detection Research Network (EDRN) have pioneered a principled informatics infrastructure to capture and disseminate data from biomarker validation studies, in effect, providing a national-scale, real-world successful example of how to address these challenges. EDRN is a distributed, collaborative network and it requires its infrastructure to support research across cancer research institutions and across their individual laboratories. The EDRN informatics infrastructure is also referred to as the EDRN Knowledge Environment, or EKE. EKE connects information about biomarkers, studies, specimens and resulting scientific data, allowing users to search, download and compare each of these disparate sources of cancer research information. EKE's data is enriched by providing annotations that describe the research results (biomarkers, protocols, studies) and that link the research results to the captured information within EDRN (raw instrument datasets, specimens, etc.). In addition EKE provides external links to public resources related to the research results and captured data. EKE has leveraged and reused data management software technologies originally developed for planetary and earth science research results and has infused those capabilities into biomarker research. This paper will describe the EDRN Knowledge Environment, its deployment to the EDRN enterprise, and how a number of these challenges have been addressed through the capture and curation of biomarker data results.
SourceAvailable from: Yang Pan[Show abstract] [Hide abstract]
ABSTRACT: Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.Database The Journal of Biological Databases and Curation 02/2014; 2014:bau022. DOI:10.1093/database/bau022 · 4.46 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Recent discoveries in cancer biology have greatly increased the understanding of cancer at the molecular level, but translating this knowledge into clinically useful diagnostic tests has proved challenging. More efficient transfer of new molecular tests into patient care requires better standardization of laboratory practices, measurement methods and data management. The workshop assembled experts from National Cancer Institute, US FDA, National Institute of Standards and Technology, academia and industry, to address the most efficient approaches to biomarker standardization and validation. The workshop participants described the current state of research in molecular diagnostics standardization and addressed three questions: what has worked? What has not?And what needs improving?Expert Review of Molecular Diagnostics 06/2013; 13(5):421-3. DOI:10.1586/erm.13.28 · 4.27 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and "-omics" data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine.Journal of Oncology 05/2013; 2013:368751. DOI:10.1155/2013/368751