Conference PaperPDF Available

Data Mining for Navigation Generating System with Unorganized Web Resources

Authors:

Abstract and Figures

Users prefer to navigate subjects from organized topics in an abundance resources than to list pages retrieved from search engines. We propose a framework to cluster frequent itemsets (sets of common words) into topics, produce a hierarchical list, and then generate topics sequence from a collection of documents. The framework will regenerate a next sequence when users click a topic. Consider browsing to any topic as a kind of searching for that topic, the framework makes an inquiry using feature terms within the document representation of selected topic as query keywords. Our ranking method in searching process considers content analysis that still retaining spatial information of search keywords and link analysis of documents. Utilizing implementation of navigation generating system the experiments show that a navigation list from clustering results can be settled with regard to variance ratio of between and within distances. Agglomerative clustering is used in restructuring the extracted topics in order to produce a hierarchical navigation list.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Using existing web resources for e-Learning is a very promising idea especially in reducing the cost of authoring. Envisioned as open-source, completely free, and frequently updated Wikipedia could become a good candidate. Even though Wikipedia has been structured by categories, still sometimes they are not dynamically updated when there are modifications. As web resources for e-Learning, it is a ne- cessity to provide a navigation path in Wikipedia which se- mantically mapping the learning material and not merely based on the structures. The desired learning material could be provided as a request from search results. We in- troduce in this paper the usage of Fourier Domain Scoring (FDS) for ranking method in searching certain collection of Wikipedia web pages. Unlike other methods that would only recognize the occurrence numbers of query terms, FDS could also recognize the spread of query terms throughout the content of web pages. Based on the experiments, we concluded that the not relevant results retrieved are mainly influenced by the characteristic of Wikipedia. Given that the changes of Wikipedia web pages could be done in any- part by anyone, we concluded that it is possible if only some parts of retrieved web pages strongly related to query terms.
Article
Full-text available
The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into account the fact that these documents are connected to each other by links. We claim that a pages classification is enriched by the detection of its incoming links semantics. This would enable effective browsing and enhance the validity of search results in the WWW context. Another aspect that is underaddressed and strictly related to the tasks of browsing and searching is the similarity of documents at the semantic level. The above observations lead us to the adoption of a hierarchy of concepts (ontology) and a thesaurus to exploit links and provide a better characterization of Web documents. The enhancement of document characterization makes operations such as clustering and labeling very interesting. To this end, we devised a system called THESUS. The system deals with an initial sets of Web documents, extracts keywords from all pages incoming links, and converts them to semantics by mapping them to a domains ontology. Then a clustering algorithm is applied to discover groups of Web documents. The effectiveness of the clustering process is based on the use of a novel similarity measure between documents characterized by sets of terms. Web documents are organized into thematic subsets based on their semantics. The subsets are then labeled, thereby enabling easier management (browsing, searching, querying) of the Web. In this article, we detail the process of this system and give an experimental analysis of its results.
Conference Paper
Full-text available
With the explosive growth of the World-Wide Web, it is becoming increasingly difficult for users to collect and organize Web pages that are relevant to a particular topic. To address this problem we are developing WTMS, a system for Web Topic Management. In this paper we explain how WTMS collects Web pages for a topic and organizes them at various levels of abstraction. We also introduce the user interface of the system that smoothly integrates querying and browsing. Moreover, we present the various views of the interface that allow the user to navigate through the information space.
Article
Full-text available
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
Article
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.