Conference Paper

A Comparison Study: Web Pages Categorization with Bayesian Classifiers

DOI: 10.1109/HPCC.2008.80 Conference: 10th IEEE International Conference on High Performance Computing and Communications, HPCC 2008, 25-27 Sept. 2008, Dalian, China
Source: DBLP


In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities of Bayesian classifiers for web pages categorization. Several feature selection techniques, such as Chi Squared, Information Gain and Gain Ratio are used for selecting relevant words in web pages. Results on benchmark dataset show that the performances of Aggregating One-Dependence Estimators (AODE) and Hidden Naive Bayes (HNB) are both more competitive than other traditional methods.

Full-text preview

Available from: