Conference Paper

Temporal Information Retrieval in Cooperative Search Engine.

Dept. of Inf. & Comput. Sci., Tokyo Univ., Japan
DOI: 10.1109/DEXA.2003.1232026 Conference: 14th International Workshop on Database and Expert Systems Applications (DEXA'03), September 1-5, 2003, Prague, Czech Republic
Source: DBLP

ABSTRACT In business, the retrieval of up-to-data, or fresh, information is very important. It is difficult for conventional search engines based on a centralized architecture to retrieve fresh information, because they take a long time to collect documents via Web robots. In contrast to a centralized architecture, a search engine based on a distributed architecture does not need to collect documents, because each site makes an index independently. As a result, distributed search engines can be used to retrieve fresh information. However, fast indexing alone is not enough to retrieve fresh information, as support fro temporal information based retrieval is also required. In this paper, we describe temporal information retrieval in distributed search engines. In particular, we propose a content-based comparison method to avoid spamming.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Crawling and indexing have been considered regarding existing search engines, but whereas high speed crawling has been studied widely, improvements in the speed of indexing have seldom been discussed. Building a fresh information search engine should, however, unify these arguments concerning crawling and indexing. In this report, an index updating process based on a pipeline that unifies crawling and indexing was proposed. The techniques of a distributed index updating process are discussed with regards to a distributed cooperative search engine.
    18th International Conference on Advanced Information Networking and Applications (AINA 2004), 29-31 March 2004, Fukuoka, Japan; 01/2004
  • [Show abstract] [Hide abstract]
    ABSTRACT: In web page retrievals, search engines are usually used. However, conventional search engines have a problem in that their update intervals are very long because they are based on centralized architecture, which gathers documents using robots. So we proposed the Cooperative Search Engine (CSE) in order to reduce the update interval. CSE is a distributed search engine, which integrates small local search engines into a large global search engine by using local meta search engines. A local meta search engine hides a local search engine in each web site. Although CSE can reduce the update interval, the retrieval performance is not enough. So, we proposed several speed up techniques. In this paper, we describe the structure and behavior of CSE and its efficiency.
    Information Networking, Wireless Communications Technologies and Network Applications, International Conference, ICOIN 2002, Cheju Island, Korea, January 30 - February 1, 2002, Revised Papers, Part II; 01/2002
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conventional search engines employ a centralized architecture. However, such engines are not suitable for fresh information retrieval because they take a long time to collect Web pages using a Web robot (or crawler). There are also distributed search engines such as Harvest etc., but they cannot update in a very short time, e.g. a few minutes. In this paper, we propose a distributed information retrieval system, called a cooperative search engine (CSE), in which multiple distributed local search engines communicate with each other in order to realize a whole global search engine. A local search engine has own index database and can update quickly. In addition, they cooperate with each other in order to reduce their workload. In this paper, we describe the structure and behavior of CSE, and its efficiency
    2002 Symposium on Applications and the Internet (SAINT 2002), 28 January - 1 February 2002, Nara City, Japan, Proceedings; 01/2002


Available from