Charles DeLisi

New England Biolabs, Ipswich, Massachusetts, United States

Are you Charles DeLisi?

Claim your profile

Publications (177)783.08 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
    PLoS Biology 08/2013; 11(8):e1001638. · 12.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the rapid accumulation of our knowledge on diseases, disease-related genes and drug targets, network-based analysis plays an increasingly important role in systems biology, systems pharmacology and translational science. The new release of VisANT aims to provide new functions to facilitate the convenient network analysis of diseases, therapies, genes and drugs. With improved understanding of the mechanisms of complex diseases and drug actions through network analysis, novel drug methods (e.g., drug repositioning, multi-target drug and combination therapy) can be designed. More specifically, the new update includes (i) integrated search and navigation of disease and drug hierarchies; (ii) integrated disease-gene, therapy-drug and drug-target association to aid the network construction and filtering; (iii) annotation of genes/drugs using disease/therapy information; (iv) prediction of associated diseases/therapies for a given set of genes/drugs using enrichment analysis; (v) network transformation to support construction of versatile network of drugs, genes, diseases and therapies; (vi) enhanced user interface using docking windows to allow easy customization of node and edge properties with build-in legend node to distinguish different node type. VisANT is freely available at: http://visant.bu.edu.
    Nucleic Acids Research 05/2013; · 8.81 Impact Factor
  • Tun-Hsiang Yang, Mark Kon, Charles Delisi
    [Show abstract] [Hide abstract]
    ABSTRACT: A host of data on genetic variation from the Human Genome and International HapMap projects, and advances in high-throughput genotyping technologies, have made genome-wide association (GWA) studies technically feasible. GWA studies help in the discovery and quantification of the genetic components of disease risks, many of which have not been unveiled before and have opened a new avenue to understanding disease, treatment, and prevention.This chapter presents an overview of GWA, an important tool for discovering regions of the genome that harbor common genetic variants to confer susceptibility for various diseases or health outcomes in the post-Human Genome Project era. A tutorial on how to conduct a GWA study and some practical challenges specifically related to the GWA design is presented, followed by a detailed GWA case study involving the identification of loci associated with glioma as an example and an illustration of current technologies.
    Methods in molecular biology (Clifton, N.J.) 01/2013; 939:233-51. · 1.29 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We demonstrate an accurate, quantitative, and label-free optical technology for high-throughput studies of receptor-ligand interactions, and apply it to TATA binding protein (TBP) interactions with oligonucleotides. We present a simple method to prepare single-stranded and double-stranded DNA microarrays with comparable surface density, ensuring an accurate comparison of TBP activity with both types of DNA. In particular, we find that TBP binds tightly to single-stranded DNA, especially to stretches of polythymine (poly-T), as well as to the traditional TATA box. We further investigate the correlation of TBP activity with various lengths of DNA and find that the number of TBPs bound to DNA increases >7-fold as the oligomer length increases from 9 to 40. Finally, we perform a full human genome analysis and discover that 35.5% of human promoters have poly-T stretches. In summary, we report, for the first time to our knowledge, the activity of TBP with poly-T stretches by presenting an elegant stepwise analysis of multiple techniques: discovery by a novel quantitative detection of microarrays, confirmation by a traditional gel electrophoresis, and a full genome prediction with computational analyses.
    Biophysical Journal 10/2012; 103(7):1510-7. · 3.67 Impact Factor
  • Source
    Shinuk Kim, Mark Kon, Charles Delisi
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Molecular markers based on gene expression profiles have been used in experimental and clinical settings to distinguish cancerous tumors in stage, grade, survival time, metastasis, and drug sensitivity. However, most significant gene markers are unstable (not reproducible) among data sets. We introduce a standardized method for representing cancer markers as 2-level hierarchical feature vectors, with a basic gene level as well as a second level of (more stable) pathway markers, for the purpose of discriminating cancer subtypes. This extends standard gene expression arrays with new pathway-level activation features obtained directly from off-the-shelf gene set enrichment algorithms such as GSEA. Such so-called pathway-based expression arrays are significantly more reproducible across datasets. Such reproducibility will be important for clinical usefulness of genomic markers, and augment currently accepted cancer classification protocols. RESULTS: The present method produced more stable (reproducible) pathway-based markers for discriminating breast cancer metastasis and ovarian cancer survival time. Between two datasets for breast cancer metastasis, the intersection of standard significant gene biomarkers totaled 7.47% of selected genes, compared to 17.65% using pathway-based markers; the corresponding percentages for ovarian cancer datasets were 20.65% and 33.33% respectively. Three pathways, consisting of Type_1_diabetes mellitus, Cytokine-cytokine_receptor_interaction and Hedgehog_signaling (all previously implicated in cancer), are enriched in both the ovarian long survival and breast non-metastasis groups. In addition, integrating pathway and gene information, we identified five (ID4, ANXA4, CXCL9, MYLK, FBXL7) and six (SQLE, E2F1, PTTG1, TSTA3, BUB1B, MAD2L1) known cancer genes significant for ovarian and breast cancer respectively. CONCLUSIONS: Standardizing the analysis of genomic data in the process of cancer staging, classification and analysis is important as it has implications for both pre-clinical as well as clinical studies. The paradigm of diagnosis and prediction using pathway-based biomarkers as features can be an important part of the process of biomarker-based cancer analysis, and the resulting canonical (clinically reproducible) biomarkers can be important in standardizing genomic data. We expect that identification of such canonical biomarkers will improve clinical utility of high-throughput datasets for diagnostic and prognostic applications. Reviewers This article was reviewed by John McDonald (nominated by I. King Jordon), Eugene Koonin, Nathan Bowen (nominated by I, King Jordon), and Ekaterina Kotelnikova (nominated by Mikhail Gelfand).
    Biology Direct 07/2012; 7(1):21. · 2.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators. We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships. CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression.
    BMC Bioinformatics 03/2012; 13:46. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The cost and time to develop a drug continues to be a major barrier to widespread distribution of medication. Although the genomic revolution appears to have had little impact on this problem, and might even have exacerbated it because of the flood of additional and usually ineffective leads, the emergence of high throughput resources promises the possibility of rapid, reliable and systematic identification of approved drugs for originally unintended uses. In this paper we develop and apply a method for identifying such repositioned drug candidates against breast cancer, myelogenous leukemia and prostate cancer by looking for inverse correlations between the most perturbed gene expression levels in human cancer tissue and the most perturbed expression levels induced by bioactive compounds. The method uses variable gene signatures to identify bioactive compounds that modulate a given disease. This is in contrast to previous methods that use small and fixed signatures. This strategy is based on the observation that diseases stem from failed/modified cellular functions, irrespective of the particular genes that contribute to the function, i.e., this strategy targets the functional signatures for a given cancer. This function-based strategy broadens the search space for the effective drugs with an impressive hit rate. Among the 79, 94 and 88 candidate drugs for breast cancer, myelogenous leukemia and prostate cancer, 32%, 13% and 17% respectively are either FDA-approved/in-clinical-trial drugs, or drugs with suggestive literature evidences, with an FDR of 0.01. These findings indicate that the method presented here could lead to a substantial increase in efficiency in drug discovery and development, and has potential application for the personalized medicine.
    PLoS Computational Biology 02/2012; 8(2):e1002347. · 4.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A central goal of biology is understanding and describing the molecular basis of plasticity: the sets of genes that are combinatorially selected by exogenous and endogenous environmental changes, and the relations among the genes. The most viable current approach to this problem consists of determining whether sets of genes are connected by some common theme, e.g. genes from the same pathway are overrepresented among those whose differential expression in response to a perturbation is most pronounced. There are many approaches to this problem, and the results they produce show a fair amount of dispersion, but they all fall within a common framework consisting of a few basic components. We critically review these components, suggest best practices for carrying out each step, and propose a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.
    Briefings in Bioinformatics 09/2011; 13(3):281-91. · 5.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Glioblastoma multiforme (GBM) tends to occur between the ages of 45 and 70. This relatively early onset and its poor prognosis make the impact of GBM on public health far greater than would be suggested by its relatively low frequency. Tissue and blood samples have now been collected for a number of populations, and predisposing alleles have been sought by several different genome-wide association (GWA) studies. The Cancer Genome Atlas (TCGA) at NIH has also collected a considerable amount of data. Because of the low concordance between the results obtained using different populations, only 14 predisposing single nucleotide polymorphism (SNP) candidates in five genomic regions have been replicated in two or more studies. The purpose of this paper is to present an improved approach to biomarker identification. Association analysis was performed with control of population stratifications using the EIGENSTRAT package, under the null hypothesis of "no association between GBM and control SNP genotypes," based on an additive inheritance model. Genes that are strongly correlated with identified SNPs were determined by linkage disequilibrium (LD) or expression quantitative trait locus (eQTL) analysis. A new approach that combines meta-analysis and pathway enrichment analysis identified additional genes. (i) A meta-analysis of SNP data from TCGA and the Adult Glioma Study identifies 12 predisposing SNP candidates, seven of which are reported for the first time. These SNPs fall in five genomic regions (5p15.33, 9p21.3, 1p21.2, 3q26.2 and 7p15.3), three of which have not been previously reported. (ii) 25 genes are strongly correlated with these 12 SNPs, eight of which are known to be cancer-associated. (iii) The relative risk for GBM is highest for risk allele combinations on chromosomes 1 and 9. (iv) A combined meta-analysis/pathway analysis identified an additional four genes. All of these have been identified as cancer-related, but have not been previously associated with glioma. (v) Some SNPs that do not occur reproducibly across populations are in reproducible (invariant) pathways, suggesting that they affect the same biological process, and that population discordance can be partially resolved by evaluating processes rather than genes. We have uncovered 29 glioma-associated gene candidates; 12 of them known to be cancer related (p = 1. 4 × 10-6), providing additional statistical support for the relevance of the new candidates. This additional information on risk loci is potentially important for identifying Caucasian individuals at risk for glioma, and for assessing relative risk.
    BMC Medical Genomics 08/2011; 4:63. · 3.91 Impact Factor
  • Charles DeLisi, Ugur Sezerman, Rakefet Rosenfeld
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational methods for identifying functional properties of proteins are briefly discussed. The methods lead to the concept of structure-function motif. A specific example is alpha amphipathicity as an indicator of antigenicity. This motif, though useful for planning experiments, is not sufficiently reliable to provide the basis for vaccine design. Recent progress on docking strategies based on structural analyses may provide methods that will be useful for both protein and nucleic acid receptors.
    07/2011: pages 443-447;
  • Source
    Yue Fan, M. Kon, Shinuk Kim, C. DeLisi
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene expression (micro array) data have been used widely in bioinformatics. The expression data of a large number of genes from small numbers of subjects are used to identify informative biomarkers that may predict or help in diagnosing some disorders. More recently, increasing amounts of information from underlying relationships of the expressed genes have become available, and workers have started to investigate algorithms which can use such a priori information to improve classification or regression based on gene expression. In this paper, we describe three novel machine learning algorithms for regularizing (smoothing) micro array expression values defined on gene sets with known prior network or metric structures, and which exploit this gene interaction information. These regularized expression values can be used with any machine classifier with the goal of better classification. In this paper, standard smoothing (denoising) techniques previously developed for functions on Euclidean spaces are extended to allow smoothing of micro array expression feature vectors using distance measures defined by biological networks. Such a priori smoothing (denoising) of the feature vectors using metrics on the index space (here the space of genes) yields better signal to noise ratios in the data. When tested on two breast cancer datasets, support vector machine classifiers trained on the smoothed expression values obtain better areas under ROC curves in two cancer datasets.
    Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: COMBREX (http://combrex.bu.edu) is a project to increase the speed of the functional annotation of new bacterial and archaeal genomes. It consists of a database of functional predictions produced by computational biologists and a mechanism for experimental biochemists to bid for the validation of those predictions. Small grants are available to support successful bids.
    Nucleic Acids Research 01/2011; 39(Database issue):D11-4. · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We develop a general method to identify gene networks from pair-wise correlations between genes in a microarray data set and apply it to a public prostate cancer gene expression data from 69 primary prostate tumors. We define the degree of a node as the number of genes significantly associated with the node and identify hub genes as those with the highest degree. The correlation network was pruned using transcription factor binding information in VisANT (http://visant.bu.edu/) as a biological filter. The reliability of hub genes was determined using a strict permutation test. Separate networks for normal prostate samples, and prostate cancer samples from African Americans (AA) and European Americans (EA) were generated and compared. We found that the same hubs control disease progression in AA and EA networks. Combining AA and EA samples, we generated networks for low low (<7) and high (≥7) Gleason grade tumors. A comparison of their major hubs with those of the network for normal samples identified two types of changes associated with disease: (i) Some hub genes increased their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with gain of regulatory control in cancer (e.g. possible turning on of oncogenes). (ii) Some hubs reduced their degree in the tumor network compared to their degree in the normal network, suggesting that these genes are associated with loss of regulatory control in cancer (e.g. possible loss of tumor suppressor genes). A striking result was that for both AA and EA tumor samples, STAT5a, CEBPB and EGR1 are major hubs that gain neighbors compared to the normal prostate network. Conversely, HIF-lα is a major hub that loses connections in the prostate cancer network compared to the normal prostate network. We also find that the degree of these hubs changes progressively from normal to low grade to high grade disease, suggesting that these hubs are master regulators of prostate cancer and marks disease progression. STAT5a was identified as a central hub, with ~120 neighbors in the prostate cancer network and only 81 neighbors in the normal prostate network. Of the 120 neighbors of STAT5a, 57 are known cancer related genes, known to be involved in functional pathways associated with tumorigenesis. Our method is general and can easily be extended to identify and study networks associated with any two phenotypes.
    Genome informatics. International Conference on Genome Informatics 07/2010; 24(1):139-53.
  • [Show abstract] [Hide abstract]
    ABSTRACT: To identify a robust panel of microRNA signatures that can classify tumor from normal kidney using microRNA expression levels. Mounting evidence suggests that microRNAs are key players in essential cellular processes and that their expression pattern can serve as diagnostic biomarkers for cancerous tissues. We selected 28 clear-cell type human renal cell carcinoma (ccRCC), samples from patient-matched specimens to perform high-throughput, quantitative real-time polymerase chain reaction analysis of microRNA expression levels. The data were subjected to rigorous statistical analyses and hierarchical clustering to produce a discrete set of microRNAs that can robustly distinguish ccRCC from their patient-matched normal kidney tissue samples with high confidence. Thirty-five microRNAs were found that can robustly distinguish ccRCC from their patient-matched normal kidney tissue samples with high confidence. Among this set of 35 signature microRNAs, 26 were found to be consistently downregulated and 9 consistently upregulated in ccRCC relative to normal kidney samples. Two microRNAs, namely, MiR-155 and miR-21, commonly found to be upregulated in other cancers, and miR-210, induced by hypoxia, were also identified as overexpressed in ccRCC in our study. MicroRNAs identified as downregulated in our study can be correlated to common chromosome deletions in ccRCC. Our analysis is a comprehensive, statistically relevant study that identifies the microRNAs dysregulated in ccRCC, which can serve as the basis of molecular markers for diagnosis.
    Urology 04/2010; 75(4):835-41. · 2.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel method is proposed for direct detection of DNA hybridization on microarrays. Optical interferometry is used for label-free sensing of biomolecular accumulation on glass surfaces, enabling dynamic detection of interactions. Capabilities of the presented method are demonstrated by high-throughput sensing of solid-phase hybridization of oligonucleotides. Hybridization of surface immobilized probes with 20 base pair-long target oligonucleotides was detected by comparing the label-free microarray images taken before and after hybridization. Through dynamic data acquisition during denaturation by washing the sample with low ionic concentration buffer, melting of duplexes with a single-nucleotide mismatch was distinguished from perfectly matching duplexes with high confidence interval (>97%). The presented technique is simple, robust, and accurate, and eliminates the need of using labels or secondary reagents to monitor the oligonucleotide hybridization.
    Biosensors & Bioelectronics 03/2010; 25(7):1789-95. · 6.45 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the important challenges to post-genomic biology is relating observed phenotypic alterations to the underlying collective alterations in genes. Current inferential methods, however, invariably omit large bodies of information on the relationships between genes. We present a method that takes account of such information - expressed in terms of the topology of a correlation network - and we apply the method in the context of current procedures for gene set enrichment analysis.
    Genome biology 02/2010; 11(2):R23. · 10.30 Impact Factor
  • Genome Informatics. 01/2010;
  • Source
    Bolan Linghu, Charles DeLisi
    [Show abstract] [Hide abstract]
    ABSTRACT: Surprising correlations between human disease phenotypes are emerging. Recent work now reveals startling phenotype connections between species, which could provide new disease models.
    Genome biology 01/2010; 11(4):116. · 10.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The representation of a high dimensional machine learning (ML) feature space $F$ as a function space for the purpose of denoising data is introduced. We illustrate an application of such a representation of feature vectors by applying a local averaging denoising method for functions on Euclidean and metric spaces (together with its graph generalization) to the regularization of feature vectors in ML. We first discuss this technique for noisy functions on $\mathbb{R}$, and then extend it to functions defined on graphs and networks. This method exhibits a paradoxical property of the bias-variance problem in machine learning, namely, that as the scale over which averages are taken decreases, the error rate for classification first decreases and then increases. This approach is tested on two benchmark DNA microarray data sets used for classification of breast tumors based on predicted metastasis.
    Communications in Mathematical Analysis. 01/2010; 8(3).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We integrate 16 genomic features to construct an evidence-weighted functional-linkage network comprising 21,657 human genes. The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimer's disease. Many of these disease-disease associations are supported by epidemiology, but with no previous genetic basis. Such associations can drive novel hypotheses on molecular mechanisms of diseases and therapies.
    Genome biology 10/2009; 10(9):R91. · 10.30 Impact Factor

Publication Stats

6k Citations
783.08 Total Impact Points

Institutions

  • 2013
    • New England Biolabs
      Ipswich, Massachusetts, United States
  • 1991–2013
    • Boston University
      • • College of Engineering
      • • Center for Advanced Biotechnology
      • • Department of Biomedical Engineering
      • • Department of Electrical and Computer Engineering
      Pittsburgh, PA, United States
  • 2011
    • University of Massachusetts Medical School
      Worcester, Massachusetts, United States
  • 2010
    • Rutgers, The State University of New Jersey
      New Brunswick, New Jersey, United States
    • Novartis Institutes for BioMedical Research
      Cambridge, Massachusetts, United States
  • 2007
    • Broad Institute of MIT and Harvard
      Cambridge, Massachusetts, United States
  • 1993–2006
    • University of Massachusetts Boston
      Boston, Massachusetts, United States
    • NCI-Frederick
      Maryland, United States
  • 2001
    • University of California, Berkeley
      • Department of Chemistry
      Berkeley, MO, United States
  • 1993–1996
    • Iowa State University
      • Department of Mathematics
      Ames, IA, United States
  • 1990–1991
    • Icahn School of Medicine at Mount Sinai
      Manhattan, New York, United States
  • 1980–1990
    • National Institutes of Health
      • • Branch of Metabolism
      • • Chemical Biology Laboratory
      Bethesda, MD, United States
  • 1982–1988
    • National Cancer Institute (USA)
      • Metabolism Branch
      Maryland, United States