About
233
Publications
35,289
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
24,169
Citations
Introduction
Co-evolution of chronic infections and immune system.
Lyme disease, autoimmune diseases
Current institution
MiLaboratories
Current position
- CEO
Additional affiliations
January 1993 - June 1996
August 1988 - December 1992
March 2003 - November 2010
GeneGo, Inc.
Position
- CEO
Publications
Publications (233)
Analysis of gene co-expression networks is a powerful “data-driven” tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analy...
Analysis of NGS and other sequencing data, gene variants, gene expression, proteomics, and other high-throughput (OMICs) data is challenging because of its biological complexity and high level of technical and biological noise. One way to deal with both problems is to perform analysis with a high fidelity annotated knowledgebase of protein interact...
The number of published findings in biomedicine increases continually. At the same time, specifics of the domain's terminology complicates the task of relevant publications retrieval. In the current research, we investigate influence of terms' variability and ambiguity on a paper's likelihood of being retrieved. We obtained statistics that demonstr...
The number of published findings in biomedicine increases continually. At the same time, specifics of the domain's terminology complicates the task of relevant publications retrieval. In the current research, we investigate influence of terms' variability and ambiguity on a paper's likelihood of being retrieved. We obtained statistics that demonstr...
In this volume, expert practitioners present a compilation of methods of functional data analysis (often referred to as “systems biology”) and its applications in drug discovery, medicine, and basic disease research. It covers such important issues as the elucidation of protein, compound and gene interactions, as well as analytical tools, including...
Signalling pathway activation analysis is a powerful approach for extracting biologically
relevant features from large-scale transcriptomic and proteomic data. However, modern
pathway-based methods often fail to provide stable pathway signatures of a specific
phenotype or reliable disease biomarkers. In the present study, we introduce the in silico...
Supplementary Figures 1-9, Supplementary Tables 1-3, Supplementary Notes 1-2 and Supplementary References.
Random forest performance by series. Prediction models performance metrics are calculated separately for each GEO series in validation set including TP - the number of responders samples identified as responders, FP - the number of responders identified as nonresponders, FN - the number of non-responders identified as responders, TN - the number of...
Gene coexpression network analysis is a powerful “data-driven” approach essential for understanding cancer biology and mechanisms of tumor development. Yet, despite the completion of thousands of studies on cancer gene expression, there have been few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-an...
S2 Table contains 3,398 gene coexpression modules in the format of gene lists, including gene connectivity values.
Similar data in the matrix format for programmatic use are available on the web at http://wgcna-modules.appspot.com/.
(XLSX)
S3 Table provides functional annotation to gene coexpression modules.
(XLSX)
S4 Table contains supporting information for the extracellular matrix module case study.
(XLSX)
S1 Table characterizes datasets included in the study: platforms, number of samples, authors, etc.
(XLSX)
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding t...
Using a 3D co-culture model, we identified significant sub-type-specific changes in gene expression, metabolic, and therapeutic sensitivity profiles of breast cancer cells in contact with cancer-associated fibroblasts (CAFs). CAF-induced gene expression signatures predicted clinical outcome and immune-related differences in the microenvironment. We...
The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation...
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered
the last nomadic hunter-gatherers of Siberia, and Ket language has no
transparent affiliation with any language family. We investigated connections
between the Kets and Siberian and North American populations, with emphasis on
the Mal'ta and Paleo-Eskimo ancient genomes us...
Many adaptive events in natural populations, as well as response to artificial selection, are caused by polygenic action. Under selective pressure, the adaptive traits can quickly respond via small allele frequency shifts spread across numerous loci. We hypothesize that a large proportion of current phenotypic variation between individuals may be b...
Background:
The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, s...
Despite the success of PubMed and other search engines in managing the
massive volume of biomedical literature and the retrieval of individual
publications, grant-related data remains scattered and relatively inaccessible.
This is problematic, as project and funding data has significant analytical
value and could be integral to publication retrieva...
Understanding the relationship between genomic variation and variation in
phenotypes for quantitative traits such as physiology, yield, fitness or
behavior, will provide important insights for both predicting adaptive
evolution and for breeding schemes. A particular question is whether the
genetic variation that influences quantitative phenotypes i...
Background
The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins’ length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein...
Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in thi...
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building a...
Despite a growing number of studies evaluating cancer of prostate (CaP) specific gene alterations, oncogenic activation of the ETS Related Gene (ERG) by gene fusions remains the most validated cancer gene alteration in CaP. Prevalent gene fusions have been described between the ERG gene and promoter upstream sequences of androgen-inducible genes, p...
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in tri...
Recurrent mutations in histone-modifying enzymes imply key roles in tumorigenesis, yet their functional relevance is largely unknown. Here, we show that JARID1B, encoding a histone H3 lysine 4 (H3K4) demethylase, is frequently amplified and overexpressed in luminal breast tumors and a somatic mutation in a basal-like breast cancer results in the ga...
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq o...
Discovery of candidate drug sensitivity predictive genomic models using pre-clinical data would significantly advance the ability to timely validate such models in clinical trials and utilize them for selecting appropriate treatments for patients. Here we report a case study using in vitro cell line viability data to build drug sensitivity models f...
Development of resistance is a significant clinical problem for virtually all targeted cancer therapies. We have generated a reproducible, patient derived xenograft (PDX) model of acquired vemurafenib resistance to address these challenges. Continuous treatment of V600E melanoma tumors, caused synchronous tumor stasis for approximately 7 weeks, fol...
Early full-term pregnancy is one of the most effective natural protections against breast cancer. To investigate this effect, we have characterized the global gene expression and epigenetic profiles of multiple cell types from normal breast tissue of nulliparous and parous women and carriers of BRCA1 or BRCA2 mutations. We found significant differe...
Background: Glioblastoma is the most common primary brain tumor in humans, which also carries a very dismal prognosis of median survival time of 14 months. TCGA consortium profiled >580 glioblastoma samples for various genomic and molecular characterizations, and has shown that the single histopathologically diagnosed cancer can be classified into...
The discovery of novel drug targets is a significant challenge in drug development. Although the human genome comprises approximately 30,000 genes, proteins encoded by fewer than 400 are used as drug targets in the treatment of diseases. Therefore, novel drug targets are extremely valuable as the source for first in class drugs. On the other hand,...
Prediction performance for 30 diseases. The file contains the ROC plots for all 30 diseases. The blue area around each curve represents the 95% confidence interval.
(DOC)
List of predicted drug targets for 30 diseases. The table contains the prioritized list of network objects for each disease. Each network object is listed with the name as contained in the Metabase resource and its prediction score. Furthermore, each network object is annotated with drug target information from Integrity, where known drug targets f...
Top 100 drug targets for 30 diseases. The table contains annotations for the top 100 drug target predictions for all 30 diseases. Each network object corresponds to either a known drug targets for the given disease, a known drug target for other diseases, or no current drug target. Furthermore, network objects with increased expression levels are h...
Commonly predicted cancer drug targets. The table lists all network objects that were commonly predicted as cancer drug targets within the top 100 for six different cancer types. For each network object, the table highlights the types of cancer for which the network object is a known drug target. Furthermore, network objects corresponding to known...
As it is the case with any OMICs technology, the value of proteomics data is defined by the degree of its functional interpretation in the context of phenotype. Functional analysis of proteomics profiles is inherently complex, as each of hundreds of detected proteins can belong to dozens of pathways, be connected in different context-specific group...
There is resurgence within drug and biomarker development communities for the use of primary tumorgraft models as improved predictors of patient tumor response to novel therapeutic strategies. Despite perceived advantages over cell line derived xenograft models, there is limited data comparing the genotype and phenotype of tumorgrafts to the donor...
Table S2. The spreadsheet contains the results of the differential gene expression analysis comparing the patient tumors that successfully developed tumorgrafts with those that did not. This file contains two spreadsheet tabs. The first sheet displays the 491 gene probesets up regulated in the tumors that successfully formed tumorgrafts (Group 2, c...
Table S1. The spreadsheet contains the results of the differential gene expression analysis comparing each patient’s tumor with the paired tumorgraft. This file contains two spreadsheet tabs. The first tab displays the 17 gene probesets up-regulated in tumorgrafts (Group 2, column C) relative to the originating patient tumors (Group 1, column B), a...
The ability to accurately predict the toxicity of drug candidates from their chemical structure is critical for guiding experimental drug discovery toward safer medicines. Under the guidance of the MetaTox consortium (Thomson Reuters, CA, USA), which comprised toxicologists from the pharmaceutical industry and government agencies, we created a comp...
Chondrosarcomas are among the most malignant skeletal tumors. Dedifferentiated chondrosarcoma is a highly aggressive subtype of chondrosarcoma, with lung metastases developing within a few months of diagnosis in 90% of patients. In this paper we performed comparative analyses of the transcriptomes of five individual metastatic lung lesions that wer...
Supplemental Table 1: The differential expression of a subset of genes in the “biased multifunctional signature” was validated by quantitative real-time PCR. The Pfaffl method was used to calculate the relative gene expression levels.
Supplemental Table 2: Datasets for the functional analysis: Up or Down-regulated genes in all the Met.- cell lines,...
It is generally believed that spontaneous tumors originate from a single cell and evolve into complex tissues composed of multiple cell types and characterized by very high morphological, physiological and genetic heterogeneity (Marusyk A, Polyak K, Biochim Biophys Acta 1805(1):105–117, 2010). In the process, cancer cells acquire six core biologica...
The molecular events leading to human embryonic stem cell (hESC) differentiation are the subject of considerable scrutiny. Here, we characterize an in vitro model that permits analysis of the earliest steps in the transition of hESC colonies to squamous epithelium on basic fibroblast growth factor withdrawal. A set of markers (GSC, CK18, Gata4, Eom...
Differential in vivo effects of cardiovascular drugs on plasma lipids.
Lists of potential targets obtained from the hepatic gene expression profile and lists of putative targets predicted from the chemical structure.
Differential effect of cardiovascular drugs on immune cell recruitment/chemotaxis.
Successful drug development has been hampered by a limited understanding of how to translate laboratory-based biological discoveries into safe and effective medicines. We have developed a generic method for predicting the effects of drugs on biological processes. Information derived from the chemical structure and experimental omics data from short...
Intratumor heterogeneity is a major clinical problem because tumor cell subtypes display variable sensitivity to therapeutics and may play different roles in progression. We previously characterized 2 cell populations in human breast tumors with distinct properties: CD44+CD24- cells that have stem cell-like characteristics, and CD44-CD24+ cells tha...
IntroductionGene Content Classifiers and Functional ClassifiersBiological Pathways and Networks Have Different Properties as Functional DescriptorsApplications of Pathways as Functional ClassifiersSingle Pathway Learning for Identifying Functional Descriptor PathwaysMultiple-Path Learning (MPL) Algorithm for Pathway DescriptorsApplications of MPL-D...
Correlation between histone modification and gene expression patterns in each sample. (A) Scatter plots comparing the level of histone modification and the level of gene expression. Each dot represents a gene. X-axis indicates mapped read counts around promoter region for the indicated histone mark and Y-axis indicates the median expression of the...
Examples of genes contained in K27 blocs. Patterns of K27 enrichment and gene expression are mutually exclusive. Examples showing clusters of genes located in K27 blocs and potentially silenced by this modification. Data was analyzed using SICER algorithm [28] using 10kb as window size. Significantly enriched regions and gene expression levels are...
Comparison of small-scale and standard ChIP-Seq protocols. (A) Schematic outline of ChIP-Seq protocol. Critical steps that required optimization (red), DNA purification steps (blue), and quality control qPCR steps (green) are indicated. (B) Correlation of small-scale ChIP-Seq data between the same cell type from two different individual (left) and...
Summary of GeneGo pathway, network, and interactome analysis for Table S3. The file contains multiple worksheets. Legend is the same as that of Table S2.
(0.54 MB XLS)
Confirmation of cell purity and histone methylation patterns. (A) We performed qRT-PCR for known markers of CD44+ progenitor (red) and CD24+ differentiated luminal epithelial (blue) cells to confirm the success of the purification procedure. Part (15%) of the fractionated cells was used for RNA preparation whereas the remaining fraction (85%) was u...
Chromatin states for “Biv-Biv-Biv” and “Biv-K27-K27” genes in different cell types. (A) Mean expression of genes (right bar graph) with the indicated chromatin pattern (left panel) in CD24+ and CD44+ samples. Wilcoxon rank sum test was performed to identify significant differences between CD44+ and CD24+ samples within the same group and between-gr...
List of genes enriched for K27 mark in CD24+ or CD44+ cells or in both cell types. The excel file contains three worksheets (CD44+K27+, CD24+K27+, and K27+ in both). Gene symbol, RefSeq ID, approved name, and chromosomal location are indicated.
(0.40 MB XLS)
List of differentially expressed genes enriched in K27me3 or K4me3 histone modifications. The file contains four worksheets. (1) CD44-high/K27+ genes: genes highly expressed in CD44+ cells and K27-enriched in either or in both cell types. (2) CD44-high/K27- genes: genes highly expressed in CD44+ cells and not K27-enriched in either cell type. (3) C...
List of genes affected by chromatin pattern changes. The file contains multiple worksheets. Genes showing specific chromatin pattern changes in hESC, CD44+ and CD24+ cells as described in Figure 5B. (A) Biv-K4-K4 (in hESC, CD44+ and CD24+, respectively), (B) Biv-K4-Biv, (C) Biv-K4-K27, (D) Biv-Biv-K4, (E) Biv-K27-K4, (F) Biv-Biv-Biv, (G) Biv-K27-K2...
List of genes associated with DMRs and K27-enriched regions. The file contains seven worksheets. Complete list of DMRs identified in CD44+ (1) and CD24+ (2) cells. List of DMR (-log10(p-value)>5) hypermethylated in CD24+ cells (CD24Met) (1) and DMR hypermethylated in CD44+ cells (CD44Met) (2). ID, -log10 (p-value), chromosomal location of the DMR,...
Examples of MSDK tag counts for selected genes with DMRs and their experimental validation and associations between DMRs and gene expression. (A) Heatmap depicting the clustering of samples based on tag counts for the 1,256 DMRs we identified. (B) Left: Schematic view of the indicated genomic region based on UCSC genome browser. Location of CpG isl...
Summary of GeneGo pathway, network, and interactome analysis for K27-enriched genes. The excel file contains multiple worksheets. GeneGo processes and canonical pathway maps are listed with p-values indicating the significance of enrichment for K27 enriched genes. Functional enrichment analysis by protein class. r: number of genes showing indicated...
Summary of GeneGo pathway, network, and interactome analysis for Table S5. The file contains multiple worksheets. Legend is the same as that of Table S2.
(0.68 MB XLS)
Description of human tissue samples used for the generation of SAGE-Seq, ChIP-Seq, and MSDK-Seq libraries and number of aligned reads in each ChIP-Seq library and number of total tags in SAGE-Seq and MSDK-Seq libraries.
(0.23 MB DOC)
Differentiation is an epigenetic program that involves the gradual loss of pluripotency and acquisition of cell type-specific features. Understanding these processes requires genome-wide analysis of epigenetic and gene expression profiles, which have been challenging in primary tissue samples due to limited numbers of cells available. Here we descr...
Introduction:
Despite rapid progress in OMICs and computational technologies in compound safety assessment, drug failure rate due to toxicity is still unacceptably high. One reason for this is an inadequate interpretation of high-throughput preclinical data. Another reason is the poor mechanistic understanding of drug side effects as currently jus...
An important general concern in cancer research is how diverse genetic alterations and regulatory pathways can produce common signaling outcomes. In this study, we report the construction of cancer models that combine unique regulation and common signaling. We compared and functionally analyzed sets of genetic alterations, including somatic sequenc...
Medulloblastoma (MB) is the most common malignant brain tumor of children. To identify the genetic alterations in this tumor
type, we searched for copy number alterations using high-density microarrays and sequenced all known protein-coding genes
and microRNA genes using Sanger sequencing in a set of 22 MBs. We found that, on average, each tumor ha...
One unresolved issue in Cystic Fibrosis research is how functional loss of CFTR, a protein involved in chloride transport, results in chronic lung inflammation. Large scale experiments investigating protein or gene expression changes due to altered trafficking of the most common disease causing CFTR mutation (ΔF508) have produced long lists of chan...
Transcriptome profiling studies suggest that a large fraction of the genome is transcribed and many transcripts function independent of their protein coding potential. The relevance of noncoding RNAs (ncRNAs) in normal physiological processes and in tumorigenesis is increasingly recognized. Here, we describe consistent and significant differences i...
Microarray-based classifiers and associated signature genes generated from various platforms are abundantly reported in the literature; however, the utility of the classifiers and signature genes in cross-platform prediction applications remains largely uncertain. As part of the MicroArray Quality Control Phase II (MAQC-II) project, we show in this...
Genomic biomarkers for the detection of drug-induced liver injury (DILI) from blood are urgently needed for monitoring drug safety. We used a unique data set as part of the Food and Drug Administration led MicroArray Quality Control Phase-II (MAQC-II) project consisting of gene expression data from the two tissues (blood and liver) to test cross-ti...
Gene expression signatures of toxicity and clinical response benefit both safety assessment and clinical practice; however, difficulties in connecting signature genes with the predicted end points have limited their application. The Microarray Quality Control Consortium II (MAQCII) project generated 262 signatures for ten clinical and three toxicol...
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative...