Conference Paper

Temporal Information Retrieval in Cooperative Search Engine.

Dept. of Inf. & Comput. Sci., Tokyo Univ., Japan
DOI: 10.1109/DEXA.2003.1232026 Conference: 14th International Workshop on Database and Expert Systems Applications (DEXA'03), September 1-5, 2003, Prague, Czech Republic
Source: IEEE Xplore


In business, the retrieval of up-to-data, or fresh, information is very important. It is difficult for conventional search engines based on a centralized architecture to retrieve fresh information, because they take a long time to collect documents via Web robots. In contrast to a centralized architecture, a search engine based on a distributed architecture does not need to collect documents, because each site makes an index independently. As a result, distributed search engines can be used to retrieve fresh information. However, fast indexing alone is not enough to retrieve fresh information, as support fro temporal information based retrieval is also required. In this paper, we describe temporal information retrieval in distributed search engines. In particular, we propose a content-based comparison method to avoid spamming.

12 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: In web page retrievals, search engines are usually used. However, conventional search engines have a problem in that their update intervals are very long because they are based on centralized architecture, which gathers documents using robots. So we proposed the Cooperative Search Engine (CSE) in order to reduce the update interval. CSE is a distributed search engine, which integrates small local search engines into a large global search engine by using local meta search engines. A local meta search engine hides a local search engine in each web site. Although CSE can reduce the update interval, the retrieval performance is not enough. So, we proposed several speed up techniques. In this paper, we describe the structure and behavior of CSE and its efficiency.
    Information Networking, Wireless Communications Technologies and Network Applications, International Conference, ICOIN 2002, Cheju Island, Korea, January 30 - February 1, 2002, Revised Papers, Part II; 01/2002
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cooperative Search Engine (CSE) is a distributed search engine, which can update indexes in a very short time for the purpose of fresh information retrieval. In CSE, the retrieval performance is dependent on cache contents because communication delay occurs at retrieval time. On the other hand, however, cache is invalidated as soon as indexes are updated. Therefore, we need persistent cache that can hold valid data before and after updating. In this paper we describe the principle and evaluations of persistent cache.
    22nd International Conference on Distributed Computing Systems, Workshops (ICDCSW '02) July 2-5, 2002, Vienna, Austria, Proceedings; 01/2002
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conventional search engines employ a centralized architecture. However, such engines are not suitable for fresh information retrieval because they take a long time to collect Web pages using a Web robot (or crawler). There are also distributed search engines such as Harvest etc., but they cannot update in a very short time, e.g. a few minutes. In this paper, we propose a distributed information retrieval system, called a cooperative search engine (CSE), in which multiple distributed local search engines communicate with each other in order to realize a whole global search engine. A local search engine has own index database and can update quickly. In addition, they cooperate with each other in order to reduce their workload. In this paper, we describe the structure and behavior of CSE, and its efficiency
    2002 Symposium on Applications and the Internet (SAINT 2002), 28 January - 1 February 2002, Nara City, Japan, Proceedings; 01/2002
Show more

Similar Publications


12 Reads
Available from