Qicheng Ma

Novartis Institutes for BioMedical Research, Cambridge, MA, USA

Are you Qicheng Ma?

Claim your profile

Publications (4)33.95 Total impact

  • Source
    Article: Uncovering mechanisms of transcriptional regulations by systematic mining of cis regulatory elements with gene expression profiles.
    [show abstract] [hide abstract]
    ABSTRACT: Contrary to the traditional biology approach, where the expression patterns of a handful of genes are studied at a time, microarray experiments enable biologists to study the expression patterns of many genes simultaneously from gene expression profile data and decipher the underlying hidden biological mechanism from the observed gene expression changes. While the statistical significance of the gene expression data can be deduced by various methods, the biological interpretation of the data presents a challenge. A method, called CisTransMine, is proposed to help infer the underlying biological mechanisms for the observed gene expression changes in microarray experiments. Specifically, this method will predict potential cis-regulatory elements in promoter regions which could regulate gene expression changes. This approach builds on the MotifADE method published in 2004 and extends it with two modifications: up-regulated genes and down-regulated genes are tested separately and in addition, tests have been implemented to identify combinations of transcription factors that work synergistically. The method has been applied to a genome wide expression dataset intended to study myogenesis in a mouse C2C12 cell differentiation model. The results shown here both confirm the prior biological knowledge and facilitate the discovery of new biological insights. The results validate that the CisTransMine approach is a robust method to uncover the hidden transcriptional regulatory mechanisms that can facilitate the discovery of mechanisms of transcriptional regulation.
    BioData Mining 02/2008; 1(1):4.
  • Article: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.
    [show abstract] [hide abstract]
    ABSTRACT: New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D-in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1-and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.
    Science 07/2007; 316(5829):1331-6. · 31.20 Impact Factor
  • Source
    Article: Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.
    [show abstract] [hide abstract]
    ABSTRACT: The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30,000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12,000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes.
    BMC Bioinformatics 02/2005; 6:242. · 2.75 Impact Factor
  • Article: Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks
    [show abstract] [hide abstract]
    ABSTRACT: Abstract Background The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. Results Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. Conclusion Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes.
    BMC Bioinformatics. 01/2005;