Conference Proceeding
Applying Web analysis in Web page filtering
Fac. of Bus. & Econ., Hong Kong Univ., China;
07/2004;
DOI:10.1109/JCDL.2004.1336155
ISBN: 1-58113-832-6 pp.376- In proceeding of: Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
Source: IEEE Xplore
-
Citations (0)
- Cited In (1)
-
Article: Incorporating Hyperlink Analysis in Web Page Clustering
[show abstract] [hide abstract]
ABSTRACT: The size of the World Wide Web is growing rapidly and it has become a very important source of information that can be useful to various academic and commercial applications. However, because of the large number of documents online, it is becoming increasingly difficult to search for useful information on the Web. General-purpose Web search engines, such as Google and AltaVista, present search results as ranked lists. Such ranked lists can only show users the first few documents of the search results and fail to give them a quick overview of retrieved document set. To address this problem, clustering techniques are often used to group documents into different topics. While traditional clustering algorithms have been applied to Web page clustering, such clustering techniques do not make use of the unique characteristics of the Web, such as its hyperlink structures. In this study, we propose to incorporate hyperlink analysis into the traditional vector space model used in document clustering. Specifically, we will introduce a new metric HFIDF based on link analysis to be used with the traditional TFIDF (term frequency multiplied by inverse document frequency) in similarity measure in clustering algorithms. The proposed study will investigate whether the use of Web structure analysis techniques improve the performance of document clustering in presenting Web search results.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
alternative way
combines Web content analysis
irrelevant documents
issues
machine learning-based approach
paper reports
relevant documents
second issue
Vertical search engines
Web
Web structure analysis
Web users