Xiaoman Li

University of Central Florida, Orlando, FL, USA

Are you Xiaoman Li?

Claim your profile

Publications (18)68.07 Total impact

  • Article: Chipmodule: systematic discovery of transcription factors and their cofactors from chip-seq data.
    [show abstract] [hide abstract]
    ABSTRACT: We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data. Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors, ChIPModule can efficiently identify groups of transcription factors, whose binding sites significantly co-occur in the ChIP-seq peak regions. By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors, and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 01/2013;
  • Article: Systematic Prediction of cis-Regulatory Elements in the Chlamydomonas reinhardtii Genome Using Comparative Genomics.
    Jun Ding, Xiaoman Li, Haiyan Hu
    [show abstract] [hide abstract]
    ABSTRACT: Chlamydomonas reinhardtii is one of the most important microalgae model organisms and has been widely studied toward the understanding of chloroplast functions and various cellular processes. Further exploitation of C. reinhardtii as a model system to elucidate various molecular mechanisms and pathways requires systematic study of gene regulation. However, there is a general lack of genome-scale gene regulation study, such as global cis-regulatory element (CRE) identification, in C. reinhardtii. Recently, large-scale genomic data in microalgae species have become available, which enable the development of efficient computational methods to systematically identify CREs and characterize their roles in microalgae gene regulation. Here, we performed in silico CRE identification at the whole genome level in C. reinhardtii using a comparative genomics-based method. We predicted a large number of CREs in C. reinhardtii that are consistent with experimentally verified CREs. We also discovered that a large percentage of these CREs form combinations and have the potential to work together for coordinated gene regulation in C. reinhardtii. Multiple lines of evidence from literature, gene transcriptional profiles, and gene annotation resources support our prediction. The predicted CREs will serve, to our knowledge, as the first large-scale collection of CREs in C. reinhardtii to facilitate further experimental study of microalgae gene regulation. The accompanying software tool and the predictions in C. reinhardtii are also made available through a Web-accessible database (http://hulab.ucf.edu/research/projects/Microalgae/sdcre/motifcomb.html).
    Plant physiology 08/2012; 160(2):613-23. · 6.53 Impact Factor
  • Article: Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.
    [show abstract] [hide abstract]
    ABSTRACT: Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.
    Plant Molecular Biology 06/2012; 80(2):177-87. · 4.15 Impact Factor
  • Article: Transcriptional regulation of co-expressed microRNA target genes.
    Ying Wang, Xiaoman Li, Haiyan Hu
    [show abstract] [hide abstract]
    ABSTRACT: MicroRNAs play pivotal roles in gene regulation. Despite various research efforts on microRNAs, how microRNA target genes are transcriptionally regulated and how the transcriptional regulation of microRNA target genes relates to that of the microRNA genes are not well studied. By investigating the transcriptional regulation of microRNA target genes, we found that different groups of target genes of the same microRNA are co-expressed under different conditions, and these groups rarely overlap with each other for the majority of microRNAs. We also discovered that co-expressed microRNA target genes are often co-regulated, and different groups of target genes of the same microRNA are often regulated differently. In addition, we observed that transcription factors regulating a microRNA gene often regulate its target genes. Our study sheds light on the regulation of microRNA target genes, which will facilitate the prediction of microRNA target genes and the understanding of the transcriptional regulation of microRNA genes.
    Genomics 12/2011; 98(6):445-52. · 3.02 Impact Factor
  • Article: Thousands of cis-regulatory sequence combinations are shared by Arabidopsis and poplar.
    Jun Ding, Haiyan Hu, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: The identification of cis-regulatory modules (CRMs) can greatly advance our understanding of gene regulatory mechanisms. Despite the existence of binding sites of more than three transcription factors (TFs) in a CRM, studies in plants often consider only the cooccurrence of binding sites of one or two TFs. In addition, CRM studies in plants are limited to combinations of only a few families of TFs. It is thus not clear how widespread plant TFs work together, which TFs work together to regulate plant genes, and how the combinations of these TFs are shared by different plants. To fill these gaps, we applied a frequent pattern-mining-based approach to identify frequently used cis-regulatory sequence combinations in the promoter sequences of two plant species, Arabidopsis (Arabidopsis thaliana) and poplar (Populus trichocarpa). A cis-regulatory sequence here corresponds to a DNA motif bound by a TF. We identified 18,638 combinations composed of two to six cis-regulatory sequences that are shared by the two plant species. In addition, with known cis-regulatory sequence combinations, gene function annotation, gene expression data, and known functional gene sets, we showed that the functionality of at least 96.8% and 65.2% of these shared combinations in Arabidopsis are partially supported, under a false discovery rate of 0.1 and 0.05, respectively. Finally, we discovered that 796 of the 18,638 combinations might relate to functions that are important in bioenergy research. Our work will facilitate the study of gene transcriptional regulation in plants.
    Plant physiology 11/2011; 158(1):145-55. · 6.53 Impact Factor
  • Article: Transcriptional regulation of mammalian miRNA genes.
    Brian C Schanen, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: MicroRNAs (miRNAs) are members of a growing family of non-coding transcripts, 21-23 nucleotides long, which regulate a diverse collection of biological processes and various diseases by RNA-mediated gene-silencing mechanisms. While currently many studies focus on defining the regulatory functions of miRNAs, few are directed towards how miRNA genes are themselves transcriptionally regulated. Recent studies of miRNA transcription have elucidated RNA polymerase II as the major polymerase of miRNAs, however, little is known of the structural features of miRNA promoters, especially those of mammalian miRNAs. Here, we review the current literature regarding features conserved among miRNA promoters useful for their detection and the current novel methodologies available to enable researchers to advance our understanding of the transcriptional regulation of miRNA genes.
    Genomics 10/2010; 97(1):1-6. · 3.02 Impact Factor
  • Source
    Article: Systematic identification of conserved motif modules in the human genome.
    [show abstract] [hide abstract]
    ABSTRACT: The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites. To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions. Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.
    BMC Genomics 10/2010; 11:567. · 4.07 Impact Factor
  • Chapter: Transcription Factor Binding Site Identification by Phylogenetic Footprinting
    Haiyan Hu, Xiaoman Li
    06/2010: pages 113-131;
  • Source
    Article: A new measurement of sequence conservation.
    Xiaohui Cai, Haiyan Hu, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes.
    BMC Genomics 12/2009; 10:623. · 4.07 Impact Factor
  • Source
    Article: Evolution of Drosophila ribosomal protein gene core promoters.
    Xiaotu Ma, Kangyu Zhang, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: The coordinated expression of ribosomal protein genes (RPGs) has been well documented in many species. Previous analyses of RPG promoters focus only on Fungi and mammals. Recognizing this gap and using a comparative genomics approach, we utilize a motif-finding algorithm that incorporates cross-species conservation to identify several significant motifs in Drosophila RPG promoters. As a result, significant differences of the enriched motifs in RPG promoter are found among Drosophila, Fungi, and mammals, demonstrating the evolutionary dynamics of the ribosomal gene regulatory network. We also report a motif present in similar numbers of RPGs among Drosophila species which does not appear to be conserved at the individual RPG gene level. A module-wise stabilizing selection theory is proposed to explain this observation. Overall, our results provide significant insight into the fast-evolving nature of transcriptional regulation in the RPG module.
    Gene 12/2008; 432(1-2):54-9. · 2.34 Impact Factor
  • Source
    Article: MOPAT: a graph-based method to predict recurrent cis-regulatory modules from known motifs.
    Jianfei Hu, Haiyan Hu, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: The identification of cis-regulatory modules (CRMs) can greatly advance our understanding of eukaryotic regulatory mechanism. Current methods to predict CRMs from known motifs either depend on multiple alignments or can only deal with a small number of known motifs provided by users. These methods are problematic when binding sites are not well aligned in multiple alignments or when the number of input known motifs is large. We thus developed a new CRM identification method MOPAT (motif pair tree), which identifies CRMs through the identification of motif modules, groups of motifs co-occurring in multiple CRMs. It can identify 'orthologous' CRMs without multiple alignments. It can also find CRMs given a large number of known motifs. We have applied this method to mouse developmental genes, and have evaluated the predicted CRMs and motif modules by microarray expression data and known interacting motif pairs. We show that the expression profiles of the genes containing CRMs of the same motif module correlate significantly better than those of a random set of genes do. We also show that the known interacting motif pairs are significantly included in our predictions. Compared with several current methods, our method shows better performance in identifying meaningful CRMs.
    Nucleic Acids Research 08/2008; 36(13):4488-97. · 8.03 Impact Factor
  • Article: Networking pathways unveils association between obesity and non-insulin dependent diabetes mellitus.
    Haiyan Hu, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: Genetic related health problems are often interrelated. Current practices to establish associations between diseases are expensive and rarely can reflect underlying molecular mechanisms, We propose a general framework to associate diseases by networking pathways. By applying our method on association study of non-insulin dependent diabetes mellitus (NIDDM) and obesity, we demonstrate that our method can both identify signature pathways for each disease and establish valid association of two diseases.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 02/2008;
  • Article: Dysregulated immune profiles for skin and dendritic cells are associated with increased host susceptibility to Haemophilus ducreyi infection in human volunteers.
    [show abstract] [hide abstract]
    ABSTRACT: In experimentally infected human volunteers, the cutaneous immune response to Haemophilus ducreyi is orchestrated by serum, polymorphonuclear leukocytes, macrophages, T cells, and myeloid dendritic cells (DC). This response either leads to spontaneous resolution of infection or progresses to pustule formation, which is associated with the failure of phagocytes to ingest the organism and the presence of Th1 and regulatory T cells. In volunteers who are challenged twice, some subjects form at least one pustule twice (PP group), while others have all inoculated sites resolve twice (RR group). Here, we infected PP and RR subjects with H. ducreyi and used microarrays to profile gene expression in infected and wounded skin. The PP and RR groups shared a core response to H. ducreyi. Additional transcripts that signified effective immune function were differentially expressed in RR infected sites, while those that signified a hyperinflammatory, dysregulated response were differentially expressed in PP infected sites. To examine whether DC drove these responses, we profiled gene expression in H. ducreyi-infected and uninfected monocyte-derived DC. Both groups had a common response that was typical of a type 1 DC (DC1) response. RR DC exclusively expressed many additional transcripts indicative of DC1. PP DC exclusively expressed differentially regulated transcripts characteristic of DC1 and regulatory DC. The data suggest that DC from the PP and RR groups respond differentially to H. ducreyi. PP DC may promote a dysregulated T-cell response that contributes to phagocytic failure, while RR DC may promote a Th1 response that facilitates bacterial clearance.
    Infection and immunity 01/2008; 75(12):5686-97. · 4.21 Impact Factor
  • Source
    Article: Transcriptional regulation in eukaryotic ribosomal protein genes.
    Haiyan Hu, Xiaoman Li
    [show abstract] [hide abstract]
    ABSTRACT: Understanding ribosomal protein gene regulation provides a good avenue for understanding gene regulatory networks. Even after 5 decades of research on ribosomal protein gene regulation, little is known about how higher eukaryotic ribosomal protein genes are coordinately regulated at the transcriptional level. However, a few recent papers shed some light on this complicated problem.
    Genomics 11/2007; 90(4):421-3. · 3.02 Impact Factor
  • Article: A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-alpha.
    [show abstract] [hide abstract]
    ABSTRACT: To detect and select patterns of transcription factor binding sites (TFBSs) which distinguish genes directly regulated by estrogen receptor-alpha (ERalpha), we developed an innovative mixture model-based discriminate analysis for identifying ordered TFBS pairs. Biologically, our proposed new algorithm clearly suggests that TFBSs are not randomly distributed within ERalpha target promoters (P-value < 0.001). The up-regulated targets significantly (P-value < 0.01) possess TFBS pairs, (DBP, MYC), (DBP, MYC/MAX heterodimer), (DBP, USF2) and (DBP, MYOGENIN); and down-regulated ERalpha target genes significantly (P-value < 0.01) possess TFBS pairs, such as (DBP, c-ETS1-68), (DBP, USF2) and (DBP, MYOGENIN). Statistically, our proposed mixture model-based discriminate analysis can simultaneously perform TFBS pattern recognition, TFBS pattern selection, and target class prediction; such integrative power cannot be achieved by current methods. The software is available on request from the authors. lali@iupui.edu Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2006; 22(18):2210-6. · 5.47 Impact Factor
  • Article: A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-alpha.
    Bioinformatics. 01/2006; 22:2210-2216.
  • Source
    Article: Estimating the repeat structure and length of DNA sequences using L-tuples.
    Xiaoman Li, Michael S Waterman
    [show abstract] [hide abstract]
    ABSTRACT: In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the l-tuple content of the reads.
    Genome Research 09/2003; 13(8):1916-22. · 13.61 Impact Factor
  • Article: Estimating the Repeat Structure and Length of DNA Sequences Using {ell}-Tuples
    Xiaoman Li, Michael S Waterman
    [show abstract] [hide abstract]
    ABSTRACT: In shotgun sequencing projects, the genome or BAC length is not always known. We approach estimating genome length by first estimating the repeat structure of the genome or BAC, sometimes of interest in its own right, on the basis of a set of random reads from a genome project. Moreover, we can find the consensus for repeat families before assembly. Our methods are based on the &ell;-tuple content of the reads.

Institutions

  • 2008–2013
    • University of Central Florida
      • • Department of Electrical Engineering & Computer Science
      • • Burnett School of Biomedical Sciences
      Orlando, FL, USA
  • 2010
    • CSU Mentor
      Long Beach, CA, USA
  • 2009
    • University of California, San Diego
      • Center for Research in Biological Systems (CRBS)
      San Diego, CA, USA
  • 2007–2008
    • Indiana University-Purdue University Indianapolis
      • • Department of Informatics
      • • Center for Computational Biology and Bioinformatics
      Indianapolis, IN, USA
  • 2003–2008
    • University of Southern California
      • • Department of Biological Sciences
      • • Department of Mathematics
      Los Angeles, CA, USA