Conference Paper

Nomograms for Visualization of Naive Bayesian Classifier.

Memorial Sloan-Kettering Cancer Center, New York, New York, United States
Conference: Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings
Source: DBLP


Besides good predictive performance, the naive Bayesian classifier can also offer a valuable insight into the structure of the training data and effects of the attributes on the class probabilities. This structure may be effectively revealed through visualization of the classifier. We propose a new way to visualize the naive Bayesian model in the form of a nomogram. The advantages of the proposed method are simplicity of presentation, clear display of the effects of individual attribute values, and visualization of confidence intervals. Nomograms are intuitive and when used for decision support can provide a visual explanation of predicted probabilities. And finally, with a nomogram, a naive Bayesian model can be printed out and used for probability prediction without the use of computer or calculator.

Download full-text


Available from: Blaz Zupan, Aug 21, 2014
  • Source
    • "Naïve Bayesian classification[11,12)is based on Bayesian Theorem. If a new record R is to be classified, Bayesian Theorem can be used to find the probability that it belongs to class C i by using Eq. "

    Full-text · Article · Feb 2015
  • Source
    • "This is done by means of pie charts which summarize the probability distribution of each feature. Nomograms (graphical representations of numerical relationships) are used by Mozina et al. [8] to visualize a NB classifier. Besides enabling prediction, the NB nomogram reveals the structure of the model and the relative influences of the feature values to the class probability. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component P(d|ci), and a constant component P(ci), the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.
    Full-text · Article · Jul 2009 · International Journal of Approximate Reasoning
  • Source
    • "In recent years, the user modeling techniques are investigated in various aspects. Many approaches, in the representation of user model, for instance, the vector space model (Salton, Wong and Yang, 1975), ontology (Vallet, et al., 2005); techniques in the machine learning, Tf-idf (Salton and Buckley, 1988), Bayes classification (Mozina, et al., 2004); genetic algorithm (Mitchell, 1996), neural networks methods (Gardner and Derrida, 1988) in the model updating, have been proposed and developed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: People spend far more time searching information over the Internet than using it, because the desired information is often buried within a long list of searched results. Personalized internet access is a feasible solution to solve this search vs. use dilemma, which helps identify the web documents users truly need. A user's interests are usually represented by a profile. In this research, an improved vector space model representation is proposed to improve the user interests management efficiency. Based on this, the research further proposes a method for user multi-interest modeling integrated with semantic similar network (SSN). It studies the feature selection in user modeling, and proposes a feature selection method combining TF and TF-IDF that is proved a better performance in the test. Finally a complete module design is presented, which provides a personalized recommendation system for practical applications.
    Full-text · Article · Jan 2009
Show more