Conference Paper

Nomograms for Visualization of Naive Bayesian Classifier.

Conference: Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings
Source: DBLP
Download full-text


Available from: Blaz Zupan, Aug 21, 2014
13 Reads
  • Source
    • "This is done by means of pie charts which summarize the probability distribution of each feature. Nomograms (graphical representations of numerical relationships) are used by Mozina et al. [8] to visualize a NB classifier. Besides enabling prediction, the NB nomogram reveals the structure of the model and the relative influences of the feature values to the class probability. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component P(d|ci), and a constant component P(ci), the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.
    International Journal of Approximate Reasoning 07/2009; 50(7-50):945-956. DOI:10.1016/j.ijar.2009.01.002 · 2.45 Impact Factor
  • Source
    • "In recent years, the user modeling techniques are investigated in various aspects. Many approaches, in the representation of user model, for instance, the vector space model (Salton, Wong and Yang, 1975), ontology (Vallet, et al., 2005); techniques in the machine learning, Tf-idf (Salton and Buckley, 1988), Bayes classification (Mozina, et al., 2004); genetic algorithm (Mitchell, 1996), neural networks methods (Gardner and Derrida, 1988) in the model updating, have been proposed and developed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: People spend far more time searching information over the Internet than using it, because the desired information is often buried within a long list of searched results. Personalized internet access is a feasible solution to solve this search vs. use dilemma, which helps identify the web documents users truly need. A user's interests are usually represented by a profile. In this research, an improved vector space model representation is proposed to improve the user interests management efficiency. Based on this, the research further proposes a method for user multi-interest modeling integrated with semantic similar network (SSN). It studies the feature selection in user modeling, and proposes a feature selection method combining TF and TF-IDF that is proved a better performance in the test. Finally a complete module design is presented, which provides a personalized recommendation system for practical applications.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent research hasdemonstrated theutility ofusing supervised classification systems forautomatic identification of lowquality microarray data.However, this approach requires annotation ofalarge training setbyaqualified expert. Inthis paperwe demonstrate the utility of an unsupervised classification technique basedontheExpectation-Maximization (EM)algorithm andnaive Bayesclassification. On ourtestset, thissystemexhibits performance comparable tothatofan analogous supervised learner constructed fromthesametraining data. Keywords-microarray, quality control, EM algorithm, Naive Bayes
Show more