[Show abstract][Hide abstract] ABSTRACT: There is a critical need for standard approaches to assess, report, and
compare the technical performance of genome-scale differential gene expression
experiments. We assess technical performance with a proposed "standard"
dashboard of metrics derived from analysis of external spike-in RNA control
ratio mixtures. These control ratio mixtures with defined abundance ratios
enable assessment of diagnostic performance of differentially expressed
transcript lists, limit of detection of ratio (LODR) estimates, and expression
ratio variability and measurement bias. The performance metrics suite is
applicable to analysis of a typical experiment, and here we also apply these
metrics to evaluate technical performance among laboratories. An
interlaboratory study using identical samples shared amongst 12 laboratories
with three different measurement processes demonstrated generally consistent
diagnostic power across 11 laboratories. Ratio measurement variability and bias
were also comparable amongst laboratories for the same measurement process.
Different biases were observed for measurement processes using different mRNA
[Show abstract][Hide abstract] ABSTRACT: The aim of this review is to comprehensively summarize the recent achievements in the field of toxicogenomics and cancer research regarding genetic-environmental interactions in carcinogenesis and detection of genetic aberrations in cancer genomes by next-generation sequencing technology. Cancer is primarily a genetic disease in which genetic factors and environmental stimuli interact to cause genetic and epigenetic aberrations in human cells. Mutations in the germline act as either high-penetrance alleles that strongly increase the risk of cancer development, or as low-penetrance alleles that mildly change an individual's susceptibility to cancer. Somatic mutations, resulting from either DNA damage induced by exposure to environmental mutagens or from spontaneous errors in DNA replication or repair are involved in the development or progression of the cancer. Induced or spontaneous changes in the epigenome may also drive carcinogenesis. Advances in next-generation sequencing technology provide us opportunities to accurately, economically, and rapidly identify genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution. Whole genome sequencing, whole exome sequencing, and RNA sequencing of paired cancer and adjacent normal tissue present a comprehensive picture of the cancer genome. These new findings should benefit public health by providing insights in understanding cancer biology, and in improving cancer diagnosis and therapy.
Journal of Environmental Science and Health Part C Environmental Carcinogenesis & Ecotoxicology Reviews 04/2014; 32(2):121-58. · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The FDA Adverse Event Reporting System (FAERS) is a database for post-marketing drug safety monitoring and influences FDA safety guidance documents, such as changes in drug labels. The number of cases in the FAERS has rapidly increased with the improvement of submission methods and data standard and thus has become an important resource for regulatory science. While the FAERS has been predominantly used for safety signal detection, this study explored its utility for disease characteristics.Clinical Pharmacology & Therapeutics (2014); Accepted article preview online 21 January 2014; doi:10.1038/clpt.2014.17.
[Show abstract][Hide abstract] ABSTRACT: Most timely, complete, and up-to-date information is available
Comprehensively written 34 chapters that cover all aspects of pharmacogenomics and its disease-specific applications
Targeted for a broad spectrum of readers
“Omics for Personalized Medicine” will give to its prospective readers the insight of both the current developments and the future potential of personalized medicine. The book brings into light how the pharmacogenomics and omics technologies are bringing a revolution in transforming the medicine and the health care sector for the better. Students of biomedical research and medicine along with medical professionals will benefit tremendously from the book by gaining from the diverse fields of knowledge of new age personalized medicine presented in the highly detailed chapters of the book. The book chapters are divided into two sections for convenient reading with the first section covering the general aspects of pharmaocogenomic technology that includes latest research and development in omics technologies. The first section also highlights the role of omics in modern clinical trials and even discusses the ethical consideration in pharmocogenomics. The second section is focusing on the development of personalized medicine in several areas of human health. The topics covered range from metabolic and neurological disorders to non-communicable as well as infectious diseases, and even explores the role of pharmacogenomics in cell therapy and transplantation technology. Thirty-four chapters of the book cover several aspects of pharmacogenomics and personalized medicine and have taken into consideration the varied interest of the readers from different fields of biomedical research and medicine. Advent of pharmacogenomics is the future of modern medicine, which has resulted from culmination of decades of research and now is showing the way forward. The book is an honest endeavour of researchers from all over the world to disseminate the latest knowledge and knowhow in personalized medicine to the community health researchers in particular and the educated public in general.
[Show abstract][Hide abstract] ABSTRACT: Endocrine active chemicals can potentially have adverse effects on both humans and wildlife. They can interfere with the body's endocrine system through direct or indirect interactions with many protein targets. Estrogen receptors (ERs) are one of the major targets and many endocrine disruptors are estrogenic and affect the normal estrogen signaling pathways. However, ERs can also serve as therapeutic targets for various medical conditions, such as menopausal symptoms, osteoporosis and ER-positive breast cancer. Because of the decades-long interest in the safety and therapeutic utility of estrogenic chemicals, a large number of chemicals have been assayed for estrogenic activity, but these data exist in various sources and different formats that restrict the ability of regulatory and industry scientists to utilize them fully for assessing risk-benefit. To address this issue, we have developed an Estrogenic Activity Database (EADB) (http://www.fda.gov/ScienceResearch/BioinformaticsTools/EstrogenicActivityDatabaseEADB/default.htm) and made it freely available to the public. EADB contains 18114 estrogenic activity data points collected for 8212 chemicals tested in 1284 binding, reporter gene, cell proliferation, and in vivo assays in 11 different species. The chemicals cover a broad chemical structure space and the data span a wide range of activities. A set of tools allow users to access EADB and evaluate potential endocrine activity of chemicals. As a case study, a classification model was developed using EADB for predicting ER binding of chemicals.
[Show abstract][Hide abstract] ABSTRACT: Abacavir is an effective nucleoside analog reverse transcriptase inhibitor used to treat human immunodeficiency virus (HIV) infected patients. Its main side effect is hypersensitivity reaction (HSR). The incidence of the HSR is associated with ethnicity among patients exposed to abacavir, and retrospective and prospective studies show a significantly increased risk of abacavir-induced HSR in human leukocyte antigen (HLA)-B*57:01-carrying patients. Immunological studies indicated that abacavir interacts specifically with HLA-B*57:01 and changed the binding specificity between the HLA molecule and the HLA-presented endogenous peptide repertoire, leading to a systemic autoimmune reaction. HLA-B*57:01 screening, combined with patch testing, had clinically predictive value and cost-effective impact in reducing the incidence of abacavir-induced HSR regardless of the HLA-B*57:01 prevalence in the population. Therefore, the US Food and Drug Administration (FDA) and international HIV treatment guidelines recommend a routine HLA-B*57:01 screening prior to abacavir treatment to decrease false positive diagnosis and prevent abacavir-induced HSR. The studies of abacavir-induced HSR and the implementation of the HLA-B*57:01 screening in the clinic represent a successful example of the use of pharmacogenetics for personalized diagnosis and therapy.
Science China. Life sciences 02/2013; 56(2):119-24. · 1.51 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers.
Science China. Life sciences 02/2013; 56(2):110-8. · 1.51 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.
PLoS ONE 09/2012; 7(9):e44483. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity.
atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis.
atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
[Show abstract][Hide abstract] ABSTRACT: The Liver Toxicity Biomarker Study is a systems toxicology approach to discover biomarkers that are indicative of a drug's potential to cause human idiosyncratic drug-induced liver injury. In phase I, the molecular effects in rat liver and blood plasma induced by tolcapone (a "toxic" drug) were compared with the molecular effects in the same tissues by dosing with entacapone (a "clean" drug, similar to tolcapone in chemical structure and primary pharmacological mechanism). Two durations of drug exposure, 3 and 28 days, were employed. Comprehensive molecular analysis of rat liver and plasma samples yielded marker analytes for various drug-vehicle or drug-drug comparisons. An important finding was that the marker analytes associated with tolcapone only partially overlapped with marker analytes associated with entacapone, despite the fact that both drugs have similar chemical structures and the same primary pharmacological mechanism of action. This result indicates that the molecular analyses employed in the study are detecting substantial "off-target" markers for the two drugs. An additional interesting finding was the modest overlap of the marker data sets for 3-day exposure and 28-day exposure, indicating that the molecular changes in liver and plasma caused by short- and long-term drug treatments do not share common characteristics.
[Show abstract][Hide abstract] ABSTRACT: Circulating microRNAs (miRNAs) have emerged as novel noninvasive biomarkers for several diseases and other types of tissue injury. This study tested the hypothesis that changes in the levels of urinary miRNAs correlate with liver injury induced by hepatotoxicants. Sprague-Dawley rats were administered acetaminophen (APAP) or carbon tetrachloride (CCl(4)) and one nonhepatotoxicant (penicillin/PCN). Urine samples were collected over a 24 h period after a single oral dose of APAP (1250 mg/kg), CCl(4) (2000 mg/kg), or PCN (2400 mg/kg). APAP and CCl(4) induced liver injury based upon increased serum alanine and aspartate aminotransferase levels and histopathological findings, including liver necrosis. APAP and CCl(4) both significantly increased the urinary levels of 44 and 28 miRNAs, respectively. In addition, 10 of the increased miRNAs were in common between APAP and CCl(4). In contrast, PCN caused a slight decrease of a different nonoverlapping set of urinary miRNAs. Cluster analysis revealed a distinct urinary miRNA pattern from the hepatotoxicant-treated groups when compared with vehicle controls and PCN. Analysis of hepatic miRNA levels suggested that the liver was the source of the increased urinary miRNAs after APAP exposure; however, the results from CCl(4) were equivocal. Computational analysis was used to predict target genes of the 10 shared hepatotoxicant-induced miRNAs. Liver gene expression profiling using whole genome microarrays identified eight putative miRNA target genes that were significantly altered in the liver of APAP- and CCl(4)-treated animals. In conclusion, the patterns of urinary miRNA may hold promise as biomarkers of hepatotoxicant-induced liver injury.
[Show abstract][Hide abstract] ABSTRACT: RNA-Seq has been increasingly used for the quantification and characterization of transcriptomes. The ongoing development of the technology promises the more accurate measurement of gene expression. However, its benefits over widely accepted microarray technologies have not been adequately assessed, especially in toxicogenomics studies. The goal of this study is to enhance the scientific community's understanding of the advantages and challenges of RNA-Seq in the quantification of gene expression by comparing analysis results from RNA-Seq and microarray data on a toxicogenomics study. A typical toxicogenomics study design was used to compare the performance of an RNA-Seq approach (Illumina Genome Analyzer II) to a microarray-based approach (Affymetrix Rat Genome 230 2.0 arrays) for detecting differentially expressed genes (DEGs) in the kidneys of rats treated with aristolochic acid (AA), a carcinogenic and nephrotoxic chemical most notably used for weight loss. We studied the comparability of the RNA-Seq and microarray data in terms of absolute gene expression, gene expression patterns, differentially expressed genes, and biological interpretation. We found that RNA-Seq was more sensitive in detecting genes with low expression levels, while similar gene expression patterns were observed for both platforms. Moreover, although the overlap of the DEGs was only 40-50%, the biological interpretation was largely consistent between the RNA-Seq and microarray data. RNA-Seq maintained a consistent biological interpretation with time-tested microarray platforms while generating more sensitive results. However, there is clearly a need for future investigations to better understand the advantages and limitations of RNA-Seq in toxicogenomics studies and environmental health research.
Chemical Research in Toxicology 08/2011; 24(9):1486-93. · 4.19 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: DNA sequencing is a powerful approach for decoding a number of human diseases, including cancers. The advent of next-generation sequencing (NGS) technologies has reduced sequencing cost by orders of magnitude and significantly increased the throughput, making whole-genome sequencing a possible way for obtaining global genomic information about patients on whom clinical actions may be taken. However, the benefits offered by NGS technologies come with a number of challenges that must be adequately addressed before they can be transformed from research tools to routine clinical practices. This article provides an overview of four commonly used NGS technologies from Roche Applied Science//454 Life Sciences, Illumina, Life Technologies and Helicos Biosciences. The challenges in the analysis of NGS data and their potential applications in clinical diagnosis are also discussed.
[Show abstract][Hide abstract] ABSTRACT: Protein-protein interactions (PPIs) are a critical component for many underlying biological processes. A PPI network can provide insight into the mechanisms of these processes, as well as the relationships among different proteins and toxicants that are potentially involved in the processes. There are many PPI databases publicly available, each with a specific focus. The challenge is how to effectively combine their contents to generate a robust and biologically relevant PPI network.
In this study, seven public PPI databases, BioGRID, DIP, HPRD, IntAct, MINT, REACTOME, and SPIKE, were used to explore a powerful approach to combine multiple PPI databases for an integrated PPI network. We developed a novel method called k-votes to create seven different integrated networks by using values of k ranging from 1-7. Functional modules were mined by using SCAN, a Structural Clustering Algorithm for Networks. Overall module qualities were evaluated for each integrated network using the following statistical and biological measures: (1) modularity, (2) similarity-based modularity, (3) clustering score, and (4) enrichment.
Each integrated human PPI network was constructed based on the number of votes (k) for a particular interaction from the committee of the original seven PPI databases. The performance of functional modules obtained by SCAN from each integrated network was evaluated. The optimal value for k was determined by the functional module analysis. Our results demonstrate that the k-votes method outperforms the traditional union approach in terms of both statistical significance and biological meaning. The best network is achieved at k = 2, which is composed of interactions that are confirmed in at least two PPI databases. In contrast, the traditional union approach yields an integrated network that consists of all interactions of seven PPI databases, which might be subject to high false positives.
We determined that the k-votes method for constructing a robust PPI network by integrating multiple public databases outperforms previously reported approaches and that a value of k=2 provides the best results. The developed strategies for combining databases show promise in the advancement of network construction and modeling.
[Show abstract][Hide abstract] ABSTRACT: As a powerful tool for genome-wide gene expression analysis, DNA microarray technology is widely used in biomedical research.
One important application of microarrays is to identify differentially expressed genes (DEGs) between two distinct biological
conditions, e.g. disease versus normal or treatment versus control, so that the underlying molecular mechanism differentiating
the two conditions maybe revealed. Mechanistic interpretation of microarray results requires the identification of reproducible
and reliable lists of DEGs, because irreproducible lists of DEGs may lead to different biological conclusions. Many vendors
are providing microarray platforms of different characteristics for gene expression analysis, and the widely publicized apparent
lack of intra- and cross-platform concordance in DEGs from microarray analysis of the same sets of study samples has been
of great concerns to the scientific community and regulatory agencies like the US Food and Drug Administration (FDA). In this
chapter, we describe the study design of and the main findings from the FDA-led MicroArray Quality Control (MAQC) project
that aims to objectively assess the performance of different microarray platforms and the advantages and limitations of various
competing statistical methods in identifying DEGs from microarray data. Using large data sets generated on two human reference
RNA samples established by the MAQC project, we show that the levels of concordance in inter-laboratory and cross-platform
comparisons are generally high. Furthermore, the levels of concordance largely depend on the statistical criteria used for
ranking and selecting DEGs, irrespective of the chosen platforms or test sites. Importantly, a straightforward method combining
fold-change ranking with a non-stringent P-value cutoff produces more reproducible lists of DEGs than those by t-test P-value ranking. Similar conclusions are reached when microarray data sets from a rat toxicogenomics study are analyzed. The
availability of the MAQC reference RNA samples and the large reference data sets provides a unique resource for the gene expression
community to reach consensus on the “best practices” for the generation, analysis, and applications of microarray data in
drug development and personalized medicine.
[Show abstract][Hide abstract] ABSTRACT: Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies (GWAS) examine the entire human genome with the goal of identifying genetic variants (usually single nucleotide polymorphisms (SNPs)) that are associated with phenotypic traits such as disease status and drug response. The discordance of significantly associated SNPs for the same disease identified from different GWAS indicates that false associations exist in such results. In addition to the possible sources of spurious associations that have been investigated and discussed intensively, such as sample size and population stratification, an accurate and reproducible genotype calling algorithm is required for concordant GWAS results from different studies. However, variations of genotype calling of an algorithm and their effects on significantly associated SNPs identified in downstream association analyses have not been systematically investigated. In this paper, the variations of genotype calling using the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm and the resulting influence on the lists of significantly associated SNPs were evaluated using the raw data of 270 HapMap samples analysed with the Affymetrix Human Mapping 500K Array Set (Affy500K) by changing algorithmic parameters. Modified were the Dynamic Model (DM) call confidence threshold (threshold) and the number of randomly selected SNPs (size). Comparative analysis of the calling results and the corresponding lists of significantly associated SNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs. Both the threshold and the size affected the called genotypes and the lists of significantly associated SNPs in association analysis. The effect of the threshold was much larger than the effect of the size. Moreover, the heterozygous calls had lower consistency compared to the homozygous calls.
Journal of Genetics 04/2010; 89(1):55-64. · 0.88 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Affymetrix GeneChip® system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip® microarrays unprocessed before their manufacturer's expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis. In addition, it was not clear whether the two human reference RNA samples established by the MAQC project in 2005 still maintained their transcriptome integrity over a period of four years. Experiments were conducted to answer these questions.
Microarray data were generated in 2009 in three replicates for each of the two MAQC samples with either expired Affymetrix U133A or unexpired U133Plus2 microarrays. These results were compared with data obtained in 2005 on the U133Plus2 microarray. The percentage of overlap between the lists of differentially expressed genes (DEGs) from U133Plus2 microarray data generated in 2009 and in 2005 was 97.44%. While there was some degree of fold change compression in the expired U133A microarrays, the percentage of overlap between the lists of DEGs from the expired and unexpired microarrays was as high as 96.99%. Moreover, the microarray data generated using the expired U133A microarrays in 2009 were highly concordant with microarray and TaqMan® data generated by the MAQC project in 2005.
Our results demonstrated that microarray data generated using U133A microarrays, which were more than four years past the manufacturer's expiration date, were highly specific and consistent with those from unexpired microarrays in identifying DEGs despite some appreciable fold change compression and decrease in sensitivity. Our data also suggested that the MAQC reference RNA samples, stored at -80°C, were stable over a time frame of at least four years.
[Show abstract][Hide abstract] ABSTRACT: Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrack™, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrack™ has been widely adopted by the research community and used for pharmacogenomics data review in the FDA's Voluntary Genomics Data Submission program.
ArrayTrack™ has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrack™'s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDA-ECSG) previously developed within the FDA. The FDA-ECSG array represents 32 complete genomes of Escherichia coli and Shigella. The new functions were also used to analyze microarray data focusing on antimicrobial resistance genes from Salmonella isolates in a poultry production environment using a universal antimicrobial resistance microarray developed by the United States Department of Agriculture (USDA).
The application of ArrayTrack™ to different microarray platforms demonstrates its utility in microbial genomics research, and thus will improve the capabilities of the FDA to rapidly identify foodborne bacteria and their genetic traits (e.g., antimicrobial resistance, virulence, etc.) during outbreak investigations. ArrayTrack™ is free to use and available to public, private, and academic researchers at http://www.fda.gov/ArrayTrack.