Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate.
ABSTRACT In learning Bayesian network classifiers, estimating probabilities from a given set of training examples is crucial. In many
cases, we can estimate probabilities by the fraction of times the events is observed to occur over the total number of opportunities.
However, when the training examples are not enough, this probability estimation method inevitably suffers from the zero-frequency
problem. To avoid this practical problem, Laplace estimate is usually used to estimate probabilities. Just as we all know,
m-estimate is another probability estimation method. Thus, a natural question is whether a Bayesian network classifier with
m-estimate can perform even better. Responding to this question, we single out a special m-estimate method and empirically
investigate its effect on various Bayesian network classifiers, such as Naive Bayes (NB), Tree Augmented Naive Bayes (TAN),
Averaged One-Dependence Estimators (AODE), and Hidden Naive Bayes (HNB). Our experiments show that the classifiers with our
m-estimate perform better than the ones with Laplace estimate.
- SourceAvailable from: K.C. Santosh
[Show abstract] [Hide abstract]
- "These extreme probabilities are " very strong " and may affect the decision process. In such a case, one can use Laplace estimator [Jiang et al., 2007]. "
ABSTRACT: In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark (PC) and maximum weighted spanning tree (MWST). Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.International Journal of Machine Learning and Cybernetics 06/2014; 6(3). DOI:10.1007/s13042-014-0234-4
07/2011; 6(7):1368-1373. DOI:10.4304/jsw.6.7.1368-1373
- "is another method to estimate probability, which has already been applied to improve the class probability estimation of Bayesian classifiers successfully . In this paper, we try to investigate its application in decision tree learning. "
- [Show abstract] [Hide abstract]
ABSTRACT: Frequent Itemsets Mining Classifier (FISC) is an improved Bayesian classifier which averaging all classifiers built by frequent itemsets. Considering that in learning Bayesian network classifier, estimating probabilities from a given set of training examples is crucial, and it has been proved that m-estimate can scale up the accuracy of many Bayesian classifiers. Thus, a natural question is whether FISC with m-estimate can perform even better. Response to this problem, in this paper, we aim to scale up the accuracy of FISC by m-estimate and propose new probability estimation formulas. The experimental results show that the Laplace estimate used in the original FISC performs not very well and our m-estimate can greatly scale up the accuracy, it even outperforms other outstanding Bayesian classifiers used to compare. KeywordsFrequent Itemsets Mining Classifier-estimating probabilities-Laplace estimate-m-estimate10/2010: pages 357-364;