Conference Paper
Probabilistic topic modeling for genomic data interpretation.
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
DOI: 10.1109/BIBM.2010.5706554 Conference: 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010, Hong Kong, China, 18  21 December 2010, Proceedings Source: DBLP

Article: MultiObjective Topic Modeling
[Show abstract] [Hide abstract]
ABSTRACT: Topic Modeling (TM) is a rapidlygrowing area at the interfaces of text mining, artificial intelligence and statistical modeling, that is being increasingly deployed to address the 'information overload' associated with extensive text repositories. The goal in TM is typically to infer a rich yet intuitive summary model of a large document collection, indicating a specific collection of topics that characterizes the collection each topic being a probability distribution over words along with the degrees to which each individual document is concerned with each topic. The model then supports segmentation, clustering, profiling, browsing, and many other tasks. Current approaches to TM, dominated by Latent Dirichlet Allocation (LDA), assume a topicdriven document generation process and find a model that maximizes the likelihood of the data with respect to this process. This is clearly sensitive to any mismatch between the 'true' generating process and statistical model, while it is also clear that the quality of a topic model is multifaceted and complex. Individual topics should be intuitively meaningful, sensibly distinct, and free of noise. Here we investigate multiobjective approaches to topic modeling, which attempt to infer coherent topic models by navigating the tradeoffs between objectives that are oriented towards coherence as well as converge of the corpus at hand. Comparisons with LDA show that adoption of MOEA approaches enables significantly more coherent topics than LDA, consequently enhancing the use and interpretability of these models in a range of applications, without any significant degradation in the models' generalization ability. 
Conference Paper: Identifying enterotype in human microbiome by decomposing probabilistic topics into components
[Show abstract] [Hide abstract]
ABSTRACT: Discovering the global structures of microbial community using largescale metagenomes is a significant challenge in the era of postgenomics. Datadriven methods such as dimension reduction have shown to be useful when they applied on a metagenomics profile matrix which summarize the abundance of functional or taxonomic categorizations in metagenomic samples. Analogously, modeldriven method such as probability topic model (PTM) has been used to build a generative model to simulate the generating of a microbial community based on metagenomic profiles. Datadriven methods are direct and simple, they provide intuitive visualization and understanding of metagenomic profiles. Modeldriven methods are often complicated but give a generative mechanism of microbial community which is helpful in understanding the generating process of complex microbial ecology. However, results from modeldriven methods are usually hard to visualize and there is less an intuitive understanding of them. We developed a new computational framework to incorporate the strength of datadriven methods into modelbased methods and applied the framework to discover and interpret enterotype in human microbiome.Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on; 01/2012
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.