Vasant Honavar

M.S., Ph.D.
Iowa State University · Computer Science

Research interests

  • Interests
    Automated Reasoning, Machine Learning

Publications

  • 3.43
    Impact points
    Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

    Rasna R Walia, Cornelia Caragea, Benjamin A Lewis, Fadi G Towfic, Michael Terribilini, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar

    BMC bioinformatics. 05/2012; 13(1):89.

    ABSTRACT: BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processi... [more] ABSTRACT: BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naive Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
  • 3.43
    Impact points
    Predicting protein-protein interface residues using local surface structural similarity.

    Rafael A Jordan, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar

    BMC bioinformatics. 03/2012; 13(1):41.

    ABSTRACT: BACKGROUND: Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce... [more] ABSTRACT: BACKGROUND: Identification of the residues in protein-protein interaction sites has a significant impact in problems such as drug discovery. Motivated by the observation that the set of interface residues of a protein tend to be conserved even among remote structural homologs, we introduce PrISE, a family of local structural similarity-based computational methods for predicting protein-protein interface residues. RESULTS: We present a novel representation of the surface residues of a protein in the form of structural elements. Each structural element consists of a central residue and its surface neighbors. The PrISE family of interface prediction methods use a representation of structural elements that captures the atomic composition and accessible surface area of the residues that make up each structural element. Each of the members of the PrISE methods identifies for each structural element in the query protein, a collection of similar structural elements in its repository of structural elements and weights them according to their similarity with the structural element of the query protein. PrISE_L relies on the similarity between structural elements (i.e. local structural similarity). PrISE_G relies on the similarity between protein surfaces (i.e. general structural similarity). PrISE_C, combines local structural similarity and general structural similarity to predict interface residues. These predictors label the central residue of a structural element in a query protein as an interface residue if a weighted majority of the structural elements that are similar to it are interface residues, and as a non-interface residue otherwise. The results of our experiments using three representative benchmark datasets show that the PrISE_C outperforms PrISE_L and PrISE_G; and that PrISE_C is highly competitive with state-of-the-art structure-based methods for predicting protein-protein interface residues. Our comparison of PrISE_C with PredUs , a recently developed method for predicting interface residues of a query protein based on the known interface residues of its (global) structural homologs, shows that performance superior or comparable to that of PredUs can be obtained using only local surface structural similarity. PrISE_C is available as a Web server at http://prise.cs.iastate.edu/ CONCLUSIONS: Local surface structural similarity based methods offer a simple, efficient, and effective approach to predict protein-protein interface residues.
  • 3.43
    Impact points
    Predicting RNA-Protein Interactions Using Only Sequence Information.

    Usha K Muppirala, Vasant G Honavar, Drena Dobbs

    BMC bioinformatics. 12/2011; 12:489.

    ABSTRACT: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginnin... [more] ABSTRACT: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.
  • 2.25
    Impact points
    Predicting MHC-II Binding Affinity Using Multiple Instance Regression

    Y. EL-Manzalawy, D. Dobbs, V. Honavar

    Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 09/2011;

    Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding... [more] Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding pathogenesis and immune response. The task of predicting MHC-II binding peptides is complicated by the significant variability in their length. Most existing computational methods for predicting MHC-II binding peptides focus on identifying a nine amino acids core region in each binding peptide. We formulate the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We present results of experiments using several benchmark data sets that show that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides. An online web server that implements the MHCMIR method for MHC-II binding affinity prediction is freely accessible at http://ailab.cs.iastate.edu/mhcmir.
  • Abstraction Super-structuring Normal Forms: Towards a Theory of Structural Induction

    Adrian Silvescu, Vasant Honavar

    07/2011;

    Induction is the process by which we obtain predictive laws or theories or models of the world. We consider the structural aspect of induction. We answer the question as to whether we can find a finite and minmalistic set of operations on structural elements in terms of which any theory can be expre... [more] Induction is the process by which we obtain predictive laws or theories or models of the world. We consider the structural aspect of induction. We answer the question as to whether we can find a finite and minmalistic set of operations on structural elements in terms of which any theory can be expressed. We identify abstraction (grouping similar entities) and super-structuring (combining topologically e.g., spatio-temporally close entities) as the essential structural operations in the induction process. We show that only two more structural operations, namely, reverse abstraction and reverse super-structuring (the duals of abstraction and super-structuring respectively) suffice in order to exploit the full power of Turing-equivalent generative grammars in induction. We explore the implications of this theorem with respect to the nature of hidden variables, radical positivism and the 2-century old claim of David Hume about the principles of connexion among ideas.
  • 3.43
    Impact points
    HomPPI: a class of sequence homology based protein-protein interface prediction methods.

    Li C Xue, Drena Dobbs, Vasant Honavar

    BMC bioinformatics. 06/2011; 12:244.

    Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. We studied more... [more] Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.
  • Scalable, updatable predictive models for sequence data

    N. Koul, N. Bui, V. Honavar

    Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on; 01/2011

    The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the l... [more] The emergence of data rich domains has led to an exponential growth in the size and number of data repositories, offering exciting opportunities to learn from the data using machine learning algorithms. In particular, sequence data is being made available at a rapid rate. In many applications, the learning algorithm may not have direct access to the entire dataset because of a variety of reasons such as massive data size or bandwidth limitation. In such settings, there is a need for techniques that can learn predictive models (e.g., classifiers) from large datasets without direct access to the data. We describe an approach to learn from massive sequence datasets using statistical queries. Specifically we show how Markov Models and Probabilistic Suffix Trees (PSTs) can be constructed from sequence databases that answer only a class of count queries. We analyze the query complexity (a measure of the number of queries needed) for constructing classifiers in such settings and outline some techniques to minimize the query complexity. We also show how some of the models can be updated in response to addition or deletion of subsets of sequences from the underlying sequence database.
  • Verifying Intervention Policies for Countering Infection Propagation over Networks: A Model Checking Approach

    G. Santhanam, Y. Suvorov, S. Basu, V. Honavar

    Proceedings of the Twenty-Fifth Conference on Artificial Intelligence. 01/2011;

    Spread of infections (diseases, ideas, etc.) in a network can be modeled as the evolution of states of nodes in a graph as a function of the states of their neighbors. Given an initial configuration of a network in which a subset of the nodes have been infected, and an infec- tion propagation functi... [more] Spread of infections (diseases, ideas, etc.) in a network can be modeled as the evolution of states of nodes in a graph as a function of the states of their neighbors. Given an initial configuration of a network in which a subset of the nodes have been infected, and an infec- tion propagation function that specifies how the states of the nodes evolve over time, we show how to use model checking to identify, verify, and evaluate the effective- ness of intervention policies for containing the propaga- tion of infection over such networks.
  • Learning Relational Bayesian Classifiers from RDF Data.

    Harris T. Lin, Neeraj Koul, Vasant Honavar

    The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I; 01/2011

  • Learning Relational Bayesian Classifiers from RDF Data

    H. Lin, N. Koul, V. Honavar

    Proceedings of the International Semantic Web Conference (ISWC 2011). Springer-Verlag Lecture Notes in Computer Science. 01/2011; 7031:389-404.

    The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can ... [more] The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for pro- cessing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We pro- vide open source implementation and evaluate the proposed approach on several large RDF datasets.
  • Representing and Reasoning with Qualitative Preferences for Compositional Systems

    G. Santhanam, S. Basu, V. Honavar

    Journal of Artificial Intelligence Research Vol 42, pp. 211-274. 01/2011; 42:211-274.

    Many applications, e.g., Web service composition, complex system design, team forma- tion, etc., rely on methods for identifying collections of objects or entities satisfying some functional requirement. Among the collections that satisfy the functional requirement, it is often necessary to identify... [more] Many applications, e.g., Web service composition, complex system design, team forma- tion, etc., rely on methods for identifying collections of objects or entities satisfying some functional requirement. Among the collections that satisfy the functional requirement, it is often necessary to identify one or more collections that are optimal with respect to user preferences over a set of attributes that describe the non-functional properties of the collection. We develop a formalism that lets users express the relative importance among attributes and qualitative preferences over the valuations of each attribute. We define a dominance relation that allows us to compare collections of objects in terms of preferences over at- tributes of the objects that make up the collection. We establish some key properties of the dominance relation. In particular, we show that the dominance relation is a strict partial order when the intra-attribute preference relations are strict partial orders and the relative importance preference relation is an interval order. We provide algorithms that use this dominance relation to identify the set of most preferred collections. We show that under certain conditions, the algorithms are guaranteed to return only (sound), all (complete), or at least one (weakly complete) of the most preferred collections. We present results of simulation experiments comparing the proposed algorithms with respect to (a) the quality of solutions (number of most preferred solutions) produced by the algorithms, and (b) their performance and efficiency. We also explore some interesting conjectures suggested by the results of our experiments that relate the properties of the user preferences, the dominance relation, and the algorithms.
  • On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars

    K. Tu, V. Honavar

    Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence. 01/2011;

    We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypoth- esis that explains the benefits of a curriculum in learning grammars and offers some useful insi... [more] We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypoth- esis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algo- rithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsuper- vised learning of probabilistic grammars.
  • Exemplar-based Robust Coherent Biclustering.

    K. Tu, X. Ouyang, D. Han, Y. Yu, V. Honavar

    Proceedings of the SIAM Conference on Data Mining. 01/2011;

    The biclustering, co-clustering, or subspace clustering prob- lem involves simultaneously grouping the rows and columns of a data matrix to uncover biclusters or sub-matrices of the data matrix that optimize a desired objective function. In coherent biclustering, the objective function contains a co... [more] The biclustering, co-clustering, or subspace clustering prob- lem involves simultaneously grouping the rows and columns of a data matrix to uncover biclusters or sub-matrices of the data matrix that optimize a desired objective function. In coherent biclustering, the objective function contains a coherence measure of the biclusters. We introduce a novel formulation of the coherent biclustering problem and use it to derive two algorithms. The first algorithm is based on loopy message passing; and the second relies on a greedy strategy yielding an algorithm that is significantly faster than the first. A distinguishing feature of these algorithms is that they identify an exemplar or a prototypical member of each bi-cluster. We note the interference from background elements in bi-clustering, and offer a means to circumvent such interference using additional regularization. Our ex- periments with synthetic as well as real-world datasets show that our algorithms are competitive with the current state- of-the-art algorithms for finding coherent bi-clusters.
  • Automata-Based Verification of Security Requirements of Composite Web Services

    Hongyu Sun, S. Basu, V. Honavar, R. Lutz

    Software Reliability Engineering (ISSRE), 2010 IEEE 21st International Symposium on; 12/2010

    With the increasing reliance of complex real-world applications on composite web services assembled from independently developed component services, there is a growing need for effective approaches to verifying that a composite service not only offers the required functionality but also satisfies th... [more] With the increasing reliance of complex real-world applications on composite web services assembled from independently developed component services, there is a growing need for effective approaches to verifying that a composite service not only offers the required functionality but also satisfies the desired non-functional requirements (NFRs). In high-assurance applications such as traffic control, medical decision support, and coordinated response to civil emergencies, of special concern are NFRs having to do with security, safety and reliability of composite services. Current approaches to verifying NFRs of composite services (as opposed to individual services) remain largely ad-hoc and informal in nature. In this paper we develop techniques for ensuring that a composite service meets the user-specified NFRs expressible in the form of hard constraints e.g., “response time has to be less than 5 minutes.” We introduce an automata-based framework for verifying that a composite service satisfies the desired NFRs based on the known guarantees regarding the non-functional properties of the component services. We further show how to improve the efficiency of verifying that a composite service indeed satisfies a desired set of NFRs by: (i) Exploiting information about the applicability of specific NFRs (e.g., security) only to certain subsets of the component services that make up a composite service to minimize the verification effort and (ii) Identifying inconsistencies between NFRs with overlapping scopes. We illustrate how our approach can be used to verify the security requirements for an Emergency Management System. We also show how the approach can be used to verify whether a composite service satisfies any desired set of NFRs that can be expressed in the form of hard constraints of a quantitative nature.
  • 7.48
    Impact points
    PRIDB: a Protein-RNA Interface Database.

    Benjamin A Lewis, Rasna R Walia, Michael Terribilini, Jeff Ferguson, Charles Zheng, Vasant Honavar, Drena Dobbs

    Nucleic acids research. 11/2010; 39(Database issue):D277-82.

    The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of us... [more] The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.
  • Learning in Presence of Ontology Mapping Errors

    N. Koul, V. Honavar

    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on; 10/2010

    The widespread use of ontologies to associate semantics with data has resulted in a growing interest in the problem of learning predictive models from data sources that use different ontologies to model the same underlying domain (world of interest). Learning from such semantically disparate data so... [more] The widespread use of ontologies to associate semantics with data has resulted in a growing interest in the problem of learning predictive models from data sources that use different ontologies to model the same underlying domain (world of interest). Learning from such semantically disparate data sources involves the use of a mapping to resolve semantic disparity among the ontologies used. Often, in practice, the mapping used to resolve the disparity may contain errors and as such the learning algorithms used in such a setting must be robust in presence of mapping errors. We reduce the problem of learning from semantically disparate data sources in the presence of mapping errors to a variant of the problem of learning in the presence of nasty classification noise. This reduction allows us to transfer theoretical results and algorithms from the latter to the former.
  • 1.96
    Impact points
    Methods for transcriptomic analyses of the porcine host immune response: application to Salmonella infection using microarrays.

    C K Tuggle, S M D Bearson, J J Uthe, T H Huang, O P Couture, Y F Wang, D Kuhar, J K Lunney, V Honavar

    Veterinary immunology and immunopathology. 10/2010; 138(4):280-91.

    Technological developments in both the collection and analysis of molecular genetic data over the past few years have provided new opportunities for an improved understanding of the global response to pathogen exposure. Such developments are particularly dramatic for scientists studying the pig, whe... [more] Technological developments in both the collection and analysis of molecular genetic data over the past few years have provided new opportunities for an improved understanding of the global response to pathogen exposure. Such developments are particularly dramatic for scientists studying the pig, where tools to measure the expression of tens of thousands of transcripts, as well as unprecedented data on the porcine genome sequence, have combined to expand our abilities to elucidate the porcine immune system. In this review, we describe these recent developments in the context of our work using primarily microarrays to explore gene expression changes during infection of pigs by Salmonella. Thus while the focus is not a comprehensive review of all possible approaches, we provide links and information on both the tools we use as well as alternatives commonly available for transcriptomic data collection and analysis of porcine immune responses. Through this review, we expect readers will gain an appreciation for the necessary steps to plan, conduct, analyze and interpret the data from transcriptomic analyses directly applicable to their research interests.
  • Scalable, updatable predictive models for sequence data.

    Neeraj Koul, Ngot Bui, Vasant Honavar

    2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010, Hong Kong, China, 18 - 21 December 2010, Proceedings; 01/2010

  • Abstraction Augmented Markov Models.

    Cornelia Caragea, Adrian Silvescu, Doina Caragea, Vasant Honavar

    ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14-17 December 2010; 01/2010

  • Identifying and Eliminating Inconsistencies in Mappings across Hierarchical Ontologies.

    Bhavesh Sanghvi, Neeraj Koul, Vasant Honavar

    On the Move to Meaningful Internet Systems, OTM 2010 - Confederated International Conferences: CoopIS, IS, DOA and ODBASE, Hersonissos, Crete, Greece, October 25-29, 2010, Proceedings, Part II; 01/2010

1 2 3 4 ... 15 Next »
288
Publications
36
Followers