Conference Paper

Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate

DOI: 10.1007/978-3-540-74205-0_52 Conference: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, Third International Conference on Intelligent Computing, ICIC 2007, Qingdao, China, August 21-24, 2007, Proceedings
Source: DBLP


In learning Bayesian network classifiers, estimating probabilities from a given set of training examples is crucial. In many
cases, we can estimate probabilities by the fraction of times the events is observed to occur over the total number of opportunities.
However, when the training examples are not enough, this probability estimation method inevitably suffers from the zero-frequency
problem. To avoid this practical problem, Laplace estimate is usually used to estimate probabilities. Just as we all know,
m-estimate is another probability estimation method. Thus, a natural question is whether a Bayesian network classifier with
m-estimate can perform even better. Responding to this question, we single out a special m-estimate method and empirically
investigate its effect on various Bayesian network classifiers, such as Naive Bayes (NB), Tree Augmented Naive Bayes (TAN),
Averaged One-Dependence Estimators (AODE), and Hidden Naive Bayes (HNB). Our experiments show that the classifiers with our
m-estimate perform better than the ones with Laplace estimate.

1 Follower
8 Reads
  • Source
    • "These extreme probabilities are " very strong " and may affect the decision process. In such a case, one can use Laplace estimator [Jiang et al., 2007]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark (PC) and maximum weighted spanning tree (MWST). Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.
    International Journal of Machine Learning and Cybernetics 06/2014; 6(3). DOI:10.1007/s13042-014-0234-4
  • Source
    • "is another method to estimate probability, which has already been applied to improve the class probability estimation of Bayesian classifiers successfully [3]. In this paper, we try to investigate its application in decision tree learning. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Decision tree is one of the most effective and widely used models for classification and ranking and has received a great deal of attention from researchers in the domain of data mining and machine learning. A critical problem in decision tree learning is how to estimate the classmembership probabilities from decision trees. In this paper, we firstly survey all kinds of class probability estimation methods, mainly include the maximum-likelihood estimate, the Laplace estimate, the m-estimate, the similarity-weighted estimate, the naive Bayes-based estimate, and so on. Then, we provide an empirical study on the classification and ranking performance of the resulting decision trees using different class probability estimation methods. The experimental results based on a large number of UCI data sets verify our conclusions.
    07/2011; 6(7):1368-1373. DOI:10.4304/jsw.6.7.1368-1373
  • [Show abstract] [Hide abstract]
    ABSTRACT: Frequent Itemsets Mining Classifier (FISC) is an improved Bayesian classifier which averaging all classifiers built by frequent itemsets. Considering that in learning Bayesian network classifier, estimating probabilities from a given set of training examples is crucial, and it has been proved that m-estimate can scale up the accuracy of many Bayesian classifiers. Thus, a natural question is whether FISC with m-estimate can perform even better. Response to this problem, in this paper, we aim to scale up the accuracy of FISC by m-estimate and propose new probability estimation formulas. The experimental results show that the Laplace estimate used in the original FISC performs not very well and our m-estimate can greatly scale up the accuracy, it even outperforms other outstanding Bayesian classifiers used to compare. KeywordsFrequent Itemsets Mining Classifier-estimating probabilities-Laplace estimate-m-estimate
    10/2010: pages 357-364;
Show more