Conference Paper

Nomograms for Visualization of Naive Bayesian Classifier.

Conference: Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings
Source: DBLP
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Saccharomyces cerevisiae strains from diverse natural habitats harbour a vast amount of phenotypic diversity, driven by interactions between yeast and the respective environment. In grape juice fermentations, strains are exposed to a wide array of biotic and abiotic stressors, which may lead to strain selection and generate naturally arising strain diversity. Certain phenotypes are of particular interest for the winemaking industry and could be identified by screening of large number of different strains. The objective of the present work was to use data mining approaches to identify those phenotypic tests that are most useful to predict a strain's potential for winemaking. We have constituted a S. cerevisiae collection comprising 172 strains of worldwide geographical origins or technological applications. Their phenotype was screened by considering 30 physiological traits that are important from an oenological point of view. Growth in the presence of potassium bisulphite, growth at 40°C, and resistance to ethanol were mostly contributing to strain variability, as shown by the principal component analysis. In the hierarchical clustering of phenotypic profiles the strains isolated from the same wines and vineyards were scattered throughout all clusters, whereas commercial winemaking strains tended to co-cluster. Mann-Whitney test revealed significant associations between phenotypic results and strain's technological application or origin. Naïve Bayesian classifier identified 3 of the 30 phenotypic tests of growth in iprodion (0.05 mg/mL), cycloheximide (0.1 µg/mL) and potassium bisulphite (150 mg/mL) that provided most information for the assignment of a strain to the group of commercial strains. The probability of a strain to be assigned to this group was 27% using the entire phenotypic profile and increased to 95%, when only results from the three tests were considered. Results show the usefulness of computational approaches to simplify strain selection procedures.
    PLoS ONE 01/2013; 8(7):e66523. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The main advantage of unordered classiflcation rules is in their power to spot and explain local regularities. However, using them in classiflcation often poses problems due to con∞icts between rules, when some resolution principle needs to be applied. On the other hand, most of the machine learning methods try to learn con∞ict-free hypotheses covering the whole domain space and are not concerned with single pat- terns only. In this paper we propose an algorithm named PILAR that combines the advantages of both approaches. Our algorithm aims at im- proving any machine learning algorithm by comparing its predictions with predictions of rules, and applying changes to the predictions of ini- tial model when necessary. Moreover, if a dummy classifler (e.g. majority classifler) is used, then this procedure acts as a classifler from rules only and can be compared to other methods for classiflcation from rules. We experimentally validated our method with two basic classiflcation meth- ods. In the flrst one dummy classifler was used and in the second logistic regression.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent research hasdemonstrated theutility ofusing supervised classification systems forautomatic identification of lowquality microarray data.However, this approach requires annotation ofalarge training setbyaqualified expert. Inthis paperwe demonstrate the utility of an unsupervised classification technique basedontheExpectation-Maximization (EM)algorithm andnaive Bayesclassification. On ourtestset, thissystemexhibits performance comparable tothatofan analogous supervised learner constructed fromthesametraining data. Keywords-microarray, quality control, EM algorithm, Naive Bayes

Full-text (2 Sources)

Available from
Aug 21, 2014