Conference Paper

Nomograms for Visualization of Naive Bayesian Classifier.

Conference: Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings
Source: DBLP
  • [Show abstract] [Hide abstract]
    ABSTRACT: Patent databases provide valuable information for technology management. However, the rapid growth of patent documents, the lengthy text and the rich of content in technical terminology, and the complicated relationships among the patents, make it taking a lot of human effort for conducting analyses. As a result, an automated system for assisting the inventors in patent analysis as well as providing support in technological innovation is in great demand. In this paper, a Semantic-based Intellectual Property Management System (SIPMS) has been developed for supporting the management of intellectual properties (IP). It incorporates semantic analysis and text mining techniques for processing and analyzing the patent documents. The method differentiates itself from the traditional technological management tools in its knowledge base. Instead of eliciting knowledge from domain experts, the proposed method adopts global patent databases as sources of knowledge. The system enables users to search for existing patent documents or relevant IP documents which are related to a potential new invention and to support invention by providing the relationships and patterns among a group of IP documents. The method has been evaluated by benchmarking with the performance against traditional text mining technique and has successfully been implemented at a selected reference site.
    Engineering Applications of Artificial Intelligence 12/2011; 24(8):1510–1520. DOI:10.1016/j.engappai.2011.05.009 · 1.96 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Saccharomyces cerevisiae strains from diverse natural habitats harbour a vast amount of phenotypic diversity, driven by interactions between yeast and the respective environment. In grape juice fermentations, strains are exposed to a wide array of biotic and abiotic stressors, which may lead to strain selection and generate naturally arising strain diversity. Certain phenotypes are of particular interest for the winemaking industry and could be identified by screening of large number of different strains. The objective of the present work was to use data mining approaches to identify those phenotypic tests that are most useful to predict a strain's potential for winemaking. We have constituted a S. cerevisiae collection comprising 172 strains of worldwide geographical origins or technological applications. Their phenotype was screened by considering 30 physiological traits that are important from an oenological point of view. Growth in the presence of potassium bisulphite, growth at 40°C, and resistance to ethanol were mostly contributing to strain variability, as shown by the principal component analysis. In the hierarchical clustering of phenotypic profiles the strains isolated from the same wines and vineyards were scattered throughout all clusters, whereas commercial winemaking strains tended to co-cluster. Mann-Whitney test revealed significant associations between phenotypic results and strain's technological application or origin. Naïve Bayesian classifier identified 3 of the 30 phenotypic tests of growth in iprodion (0.05 mg/mL), cycloheximide (0.1 µg/mL) and potassium bisulphite (150 mg/mL) that provided most information for the assignment of a strain to the group of commercial strains. The probability of a strain to be assigned to this group was 27% using the entire phenotypic profile and increased to 95%, when only results from the three tests were considered. Results show the usefulness of computational approaches to simplify strain selection procedures.
    PLoS ONE 07/2013; 8(7):e66523. DOI:10.1371/journal.pone.0066523 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing is essential to understand individual variation and to study the mechanisms that explain relations between genotype and phenotype. The accumulated knowledge from large-scale genome sequencing projects of Saccharomyces cerevisiae isolates is being used to study the mechanisms that explain such relations. Our objective was to undertake genetic characterization of 172 S. cerevisiae strains from different geographical origins and technological groups, using 11 polymorphic microsatellites, and computationally relate these data with results of 30 phenotypic tests. Genetic characterization revealed 280 alleles, with microsatellite ScAAT1 contributing the most to intra-strain variability, together with the alleles 20, 9 and 16, from the microsatellites ScAAT4, ScAAT5 and ScAAT6. These microsatellite allelic profiles are characteristic both for the phenotype and the origin of yeast strains. We confirm the strength of these associations by construction and cross-validation of computational models that can predict the technological application and origin of a strain from the microsatellite allelic profile. Associations between microsatellites and specific phenotypes were scored using information gain ratio, and significant findings were confirmed by permutation tests and estimation of false discovery rate. The phenotypes associated with higher number of alleles were the capacity to resist to sulphur dioxide (tested by the capacity to grow in the presence of potassium bisulphite) and the presence of galactosidase activity. Our study demonstrates the utility of computational modelling to estimate a strain technological group and phenotype from microsatellite allelic combinations as tools for preliminary yeast strain selection. This article is protected by copyright. All rights reserved.
    Yeast 07/2014; 31(7). DOI:10.1002/yea.3016 · 1.74 Impact Factor