Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate.
ABSTRACT In learning Bayesian network classifiers, estimating probabilities from a given set of training examples is crucial. In many
cases, we can estimate probabilities by the fraction of times the events is observed to occur over the total number of opportunities.
However, when the training examples are not enough, this probability estimation method inevitably suffers from the zero-frequency
problem. To avoid this practical problem, Laplace estimate is usually used to estimate probabilities. Just as we all know,
m-estimate is another probability estimation method. Thus, a natural question is whether a Bayesian network classifier with
m-estimate can perform even better. Responding to this question, we single out a special m-estimate method and empirically
investigate its effect on various Bayesian network classifiers, such as Naive Bayes (NB), Tree Augmented Naive Bayes (TAN),
Averaged One-Dependence Estimators (AODE), and Hidden Naive Bayes (HNB). Our experiments show that the classifiers with our
m-estimate perform better than the ones with Laplace estimate.
- SourceAvailable from: academypublisher.comJSW. 01/2011; 6:1368-1373.
Article: Learning random forests for ranking.[Show abstract] [Hide abstract]
ABSTRACT: The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experimental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn’t perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity-weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF.Frontiers of Computer Science in China 01/2011; 5:79-86. · 0.27 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Frequent Itemsets Mining Classifier (FISC) is an improved Bayesian classifier which averaging all classifiers built by frequent itemsets. Considering that in learning Bayesian network classifier, estimating probabilities from a given set of training examples is crucial, and it has been proved that m-estimate can scale up the accuracy of many Bayesian classifiers. Thus, a natural question is whether FISC with m-estimate can perform even better. Response to this problem, in this paper, we aim to scale up the accuracy of FISC by m-estimate and propose new probability estimation formulas. The experimental results show that the Laplace estimate used in the original FISC performs not very well and our m-estimate can greatly scale up the accuracy, it even outperforms other outstanding Bayesian classifiers used to compare. KeywordsFrequent Itemsets Mining Classifier-estimating probabilities-Laplace estimate-m-estimate10/2010: pages 357-364;