Building a Multi-level Database for Efficient Information Retrieval: A Framework Definition.
ABSTRACT With the explosive growth of the Internet and the World Wide Web, the amount of information available online is growing in an exponential manner. As the amount of information online constantly increases, it is becoming increasingly difficult and resource demanding to search and locate information in an efficient manner. Information overload has become a pressing research problem since current searching mechanisms, such as conventional search engines, suffer from both low- precision and low-recall. It is clear that a more dynamic, scalable and accurate searching methodology needs to be developed to overcome these limitations. This paper proposes a methodology consisting of an amalgamation of several research areas such as Web mining and relational database systems. We develop a proof of concept prototype which consists of an agent used to extract information from individual Web pages and a dynamic multi-level relational schema to encapsulate this information for later processing. The prototype provides users with a higher level of scalability and flexibility and can be utilized for searching the Internet and Intranets across large-scale organizations.
Full-textDOI: · Available from: Christos Tjortjis, Jun 02, 2015
SourceAvailable from: Choochart Haruechaiyasak
Conference Paper: Web document classification based on fuzzy association[Show abstract] [Hide abstract]
ABSTRACT: In this paper, a method of automatically classifying web documents into a set of categories using the fuzzy association concept is proposed. Using the same word or vocabulary to describe different entities creates ambiguity, especially in the web environment where the user population is large. To solve this problem, fuzzy association is used to capture the relationships among different index terms or keywords in the documents, i.e., each pair of words has an associated value to distinguish itself from the others. Therefore, the ambiguity in word usage is avoided. Experiments using data sets collected from two web portals: Yahoo! and Open Directory Project are conducted. We compare our approach to the vector space model with the cosine coefficient. The results show that our approach yields higher accuracy compared to the vector space model.Computer Software and Applications Conference, 2002. COMPSAC 2002. Proceedings. 26th Annual International; 02/2002
Article: Databases deepen the Web[Show abstract] [Hide abstract]
ABSTRACT: The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynamically-pages that are hard for traditional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the "deep Web" in contrast to the "surface Web" that traditional search engines access easily. We expect deep Web search engines and technologies to improve rapidly and to dramatically affect how the Web is used by providing easy access to many more information resources.Computer 02/2004; 37(1-37):116 - 117. DOI:10.1109/MC.2004.1260731 · 1.44 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical largescale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. Keywords World Wide Web, Search Engines, Information Retrieval, PageRank, Google 1.Computer Networks and ISDN Systems 04/1998; 30:107-117. DOI:10.1016/S0169-7552(98)00110-X