Conference Proceeding

Applying Web analysis in Web page filtering

Fac. of Bus. & Econ., Hong Kong Univ., China;
07/2004; DOI:10.1109/JCDL.2004.1336155 ISBN: 1-58113-832-6 pp.376- In proceeding of: Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
Source: IEEE Xplore

ABSTRACT Vertical search engines provide Web users with an alternative way to search for information on the Web by providing customized searching in particular domains. However, two issues need to be addressed when developing these search engines: how to locate relevant documents on the Web and how to filter out irrelevant documents from a set of documents collected from the Web. This paper reports the research in addressing the second issue. In this research a machine learning-based approach that combines Web content analysis and Web structure analysis is proposed.

0 0
 · 
0 Bookmarks
 · 
37 Views
  • Source
    Article: Incorporating Hyperlink Analysis in Web Page Clustering
    [show abstract] [hide abstract]
    ABSTRACT: The size of the World Wide Web is growing rapidly and it has become a very important source of information that can be useful to various academic and commercial applications. However, because of the large number of documents online, it is becoming increasingly difficult to search for useful information on the Web. General-purpose Web search engines, such as Google and AltaVista, present search results as ranked lists. Such ranked lists can only show users the first few documents of the search results and fail to give them a quick overview of retrieved document set. To address this problem, clustering techniques are often used to group documents into different topics. While traditional clustering algorithms have been applied to Web page clustering, such clustering techniques do not make use of the unique characteristics of the Web, such as its hyperlink structures. In this study, we propose to incorporate hyperlink analysis into the traditional vector space model used in document clustering. Specifically, we will introduce a new metric HFIDF based on link analysis to be used with the traditional TFIDF (term frequency multiplied by inverse document frequency) in similarity measure in clustering algorithms. The proposed study will investigate whether the use of Web structure analysis techniques improve the performance of document clustering in presenting Web search results.

Keywords

alternative way
 
combines Web content analysis
 
irrelevant documents
 
issues
 
machine learning-based approach
 
paper reports
 
relevant documents
 
second issue
 
Vertical search engines
 
Web
 
Web structure analysis
 
Web users
 

M. Chau