Conference Paper

An agent-based search engine based on the Internet search service on the CORBA

Authors:
  • National Yang Ming Chiao Tung University
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Search services are important tools in the World Wide Web. In general, these standard Web search engines are far from ideal. Many researchers have therefore implemented the multi-engine search service (MESS) using meta-broker. However these MESS prove difficult when integrating a new search engine. On the other hand, applications that need the search service ability also prove difficult using these MESS. In this paper we propose an Internet search service (ISS) based on CORBA. We follow the style of Common Object Service Specification to define the interface of ISS, so that it is not only easily to integrate any search engine into multi-search services, but can also be queried by application programs. In addition, two search engine agents are implemented in our project, one is for Yahoo and the other is for AltaVista. Programmers can use this interface to code their search engine agent or to query the search service in their applications. Finally, we build a heterogeneous search engine agent based on this architecture

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To overcome the above mentioned challenges we use the clustering method to overcome the ambiguity of the user query and to manage the size of the cluster we use the personalized clustering technique 2.0 RELATED WORK 2. 1 Query Clustering A great challenge for search engines is to understand users' search intent behind queries. Several traditional approaches to query understanding focus on exploiting information such as users' explicit feedbacks, implicit feedbacks, user profiles, thesaurus, snippets, and anchor texts. ...
... The style of COSS (Common Object Service Specification) is used to interface ISS. We can implement a heterogeneous full-text search engine agent prototype by integrating the above agents [1]. Batool Arzanian et al., proposes a Fuzzy networks concept are used to propose multi-agent architecture for personalizing meta-search engine. ...
... Is it true to integrate data for the standardization process of data definitions and data structures by using a common conceptual schema across a collection of data sources? Text Mining Indexing, Retrieving, Extraction, Clustering [15],[9],[16],[17],[18],[19], Syntactical Sample pseudonymized [20], [21], [22], [23] Middleware Tier level with tree [24], [25], [26] XML Document Hierarchical structure [27], [28], [29] Set similarity join [30], [31], [32], Keyword-based Searching heterogeneous type [33] Elaboration Heterogeneous data sources [34] Web mining Data management [5],[35],[36] Collaborate & Integrating Agent [37], [38], [39], [40] With respect work by [13] presenting four data source with multiple audit streams from diverse cyber sensor: (i) raw network traffic, (ii) netflow data, (iii) system call, and (iv) output alert from IDS. Unfortunately, we assume this method cannot be effective with new challenge of intrusion threat. ...
... In the concept of collaborating & integrating with agent, these methods essentially process that function on behalf of other processes and other. We analysing, the agent method development of the Web in the early 1990s and has grown rapidly since the crawler system used in search engines [37] and [38], Agents may be self-describing; they may be decentralized, distributed, autonomous, and heterogeneous. Various agent architectures have been proposed [39], [40]. ...
Article
Full-text available
Currently, much of the information is now in textual form, this information can be correlate and appropriate for solving problem on a particular problem. This could be data from the web, library data, logging, and past information that are stored as archives, these data can form a pattern of specific information. It gives a collection of datasets, we were asked to examine a sample of such data and look for pattern which may exist between certain pattern methods over time. In this paper, we showed a data mining approach to collecting scattered information in routine update regularly from provider or security community. This paper addresses problems and existing theories in possible future research in this field.
... The major goal of this study was design and implement a multi-threaded object request broker that is fully compliant with CORBA2. 0. The motivations for this work are as follows: First, it will support related work in our group, such as work on concurrency control service, object transaction service [8] , and the heterogeneous full-text search engine agent [9], the aim of which is to develop a multi-threaded version. Second, prior to beginning this work, no commercial multi-threaded version object request brokers were available. ...
Article
Full-text available
Multi-threaded programming is a well-known technique for improving the performance of applications. In a CORBA environment, clients can invoke shared remote objects. If these objects are single-threaded, the performance of the system in the large distributed applications is affected. This paper presents a detailed description of the design and implementation of a multi-threaded Object Request Broker (ORB) on CORBA. The ORE was implemented on top of Windows NT and the underlying TCP protocol. The system's performance in both one-way and two-way requests is compared with that of a well-known commercial product, the IONA Orbix.
... The major goal of this study was design and implement a multi-threaded object request broker that is fully compliant with CORBA2. 0. The motivations for this work are as follows: First, it will support related work in our group, such as work on concurrency control service, object transaction service [8] , and the heterogeneous full-text search engine agent [9], the aim of which is to develop a multi-threaded version. Second, prior to beginning this work, no commercial multi-threaded version object request brokers were available. ...
Conference Paper
The distributed object oriented computing model is the next logical step to develop distributed applications. In recent years, several object models have been proposed, such as COM/DCOM, CORBA, and JAVA Bean etc. In CORBA, which was announced by OMG, object request broker is a software bus to connect applications and object components. In addition, multi threaded programming is a well known technique to improve the performance of applications. In a CORBA environment, clients can invoke the remote objects that are shared. If those objects are single threaded it will affect system performance in large distributed applications. We describe in detail the design and implementation of multi threaded object request broker based on CORBA. Our ORB was implemented atop Windows NT and underlying TCP protocol. Finally, we compare our system's performance with IONA's Orbix, which is a well known commercial product, in both one-way and two-way request
Article
Full-text available
In this paper we have tried to model, design and test a prototype of Farsi/English search engine. The engine has the duty of covering the web media features such as heterogeneity, volatility and huge amount of unstructured worldwide information. These features as well as the rapid advance in technology, challenge the effectiveness of classical Information Retrieval (IR) techniques. Although a growing number of sites with Farsi language support exist, still few research works have been done regarding the computational linguistic approach to this language, particularly in the field of thesaurus construction and stemming. In this paper, we have tried to utilize the past research experiences to design Farsi/English search engine. It seems that Unicode is sufficiently capable of preparing a conclusive environment within this respect specially regarding the issues of indexing and searching web pages. Many common Farsi code-pages are now being used which are to be converted into Unicode, in order to cover most of the existing Farsi web pages. To handle the complexity in analysis and design of the system and generate a visual easy-to-scale model Unified Modeling Language (UML) was utilized; and to assure scalability, distributed functionality and reliability, we tried to use some successful industrial solutions: Relational Database in managing the web-page indices and Clustering techniques to balance the high workload on the user interface and index management unit. We've chosen Common Object Request Broker Architecture (CORBA) due to distributed object-oriented design of the system and our agent-oriented trends in future. We've tried to apply CORBA design ideas to build a scalable and platform-independent framework.
Chapter
Full-text available
The Zone Routing Protocol (ZRP) is a hybrid routing protocol that proactively maintains routes within a local region of the network (which we refer to as the routing zone). Here, we describe the motivation of ZRP and its architecture also the query control mechanisms, which are used to reduce the traffic amount in the route discovery procedure. In this paper, we address the issue of configuring the ZRP to provide the best performance for a particular network, at any time. Through NS2 simulation, we draw conclusions about the performance of the protocol. KeywordsZone Routing Protocol–Routing zone–Query control mechanisms
Chapter
Most of the resource present in grid are underutilized these days. Therefore one of the most important issue is the best utilization of grid resource based on users request. The architecture of intelligent agent proposed to handle this issue consists of four main parts. We discuss the need and functionality of such an agent and propose a solution for resource sharing which satisfies problems faced by today’s grid. A J2EE based solution is developed as a proof of concept for the proposed technique. This paper addresses issues such as resource discovery, performance, security and decentralized resource sharing which are of concern in current grid environment. Keywordsgrid–resource sharing–intelligent agent–decentralization
Conference Paper
Studies surrounding MPEG-7 have so far concentrated only on the normative components. While feature extraction has been an active research subject in the content based retrieval domain, work on proper search engines is scarce. The paper presents SOLO, a perceived MPEG-7 optimum search tool prototype being developed at the University of Sydney. SOLO is built on the meta-search engine and mobile code paradigms, and is equipped with computational intelligence technology
Article
Full-text available
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The DISCO prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries on data source is difficult. We call this problem the weak data source problem. Second, since each data source performs operations in a generally unique way, the cost for performing an operation may vary radically from one wrapper to another. We call this problem the radical cost problem. Finally, existing systems behave rudely when attempting to access an unavailable data source. We call this problem the ungraceful failure problem. DISCO copes with these problems. For the weak data source problem, the database implementor defines precisely the capabilities of each data source. For the radical cost problem, the database implementor (optionally) defines cost information for some of the operations of a data source. The mediator uses this cost information to improve its cost model. To deal with ungraceful failures, queries return partial answers. A partial answer contains the part of the final answer to the query that was produced by the available data sources. The current working prototype of DISCO contains implementations of these solutions and operations over a collection of wrappers that access information both in files and on the World Wide Web.
Conference Paper
Full-text available
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The Disco prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries on data source is difficult. We call this problem the weak data source problem. Second, since each data source performs operations in a generally unique way, the cost for performing an operation may vary radically from one wrapper to another. We call this problem the radical cost problem. Finally, existing systems behave rudely when attempting to access an unavailable data source. We call this problem the ungraceful failure problem. DISCO copes with these problems. For the weak data source problem, the database implementor defines precisely the capabilities of each data source. For the radical cost problem, the database implementor (optionally) defines cost information for some of the operations of a data source. The mediator uses this cost information to improve its cost model. To deal with ungraceful failures, queries return partial answers. A partial answer contains the part of the final answer to the query that was produced by the available data sources. The current working prototype of DISCO contains implementations of these solutions and operations over a collection of wrappers that access information both in files and on the World Wide Web.
Article
Full-text available
This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing the effect of time and experience on performance. We also compared the metaindex approach to the simpler categorical approach and showed how much experience is required to surpass the simple scheme.
Article
Document sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines are typically incompatible in that they support different query models and interfaces, they do not return enough information with the query results for adequate merging of the results, and finally, in that they do not export metadata about the collections that they index (e.g., to assist in resource discovery). This paper describes STARTS , an emerging protocol for Internet retrieval and search that facilitates the task of querying multiple document sources. STARTS has been developed in a unique way. It is not a standard, but a group effort coordinated by Stanford's Digital Library project, and involving over 11 companies and organizations. The objective of this paper is not only to give an overview of the STARTS protocol proposal, but also to discuss the process that led to its definition.
Conference Paper
Many sources on the Internet and elsewhere rank the objects in query results according to how well these objects match the origi- nal query. For example, a real-estate agent might rank the available houses according to how well they match the user's preferred lo- cation and price. In this environment, "meta- brokers" usually query multiple autonomous, heterogeneous sources that might use varying result-ranking strategies. A crucial problem that a meta-broker then faces is extracting from the underlying sources the top objects for a user query according to the meta-broker's ranking function. This problem is challeng- ing because these top objects might not be ranked high by the sources where they appear. In this paper we discuss strategies for solv- ing this "meta-ranking" problem. In particu- lar, we present a condition that a source must satisfy so that a meta-broker can extract the top objects for a query from the source with- out examining its entire contents. Not only is this condition necessary but it is also suf- ficient, and we show an algorithm to extract the top objects from sources that satisfy the given condition.
Conference Paper
Document sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines are typically incompatible in that they support different query models and interfaces, they do not return enough information with the query results for adequate merging of the results, and finally, in that they do not export metadata about the collections that they index (e.g., to assist in resource discovery). This paper describes STARTS, an emerging protocol for Internet retrieval and search that facilitates the task of querying multiple document sources. STARTS has been developed in a unique way. It is not a standard, but a group effort coordinated by Stanford's Digital Library project, and involving over 11 companies and organizations. The objective of this paper is not only to give an overview of the STARTS protocol proposal, but...
Article
The dozens of existing search tools and the keyword-based search model have become the main issues of accessing the ever growing WWW. Various ranking algorithms, which are used to evaluate the relevance of documents to the query, have turn out to be impractical. This is because the information given by the user is too few to give good estimation. In this paper, we propose a new idea of searching under the multi-engine search architecture to overcome the problems. These include clustering of the search results and extraction of co-occurrence keywords which with the user's feedback better refines the query in the searching process. Besides, our system also provides the construction of the concept space to gradually customize the search tool to fit the usage for the user at the same time.
Conference Paper
Finding the right information in the World Wide Web is becoming a fundamental problem, since the amount of global information that the WWW contains is growing at an incredible rate. In this paper, we present a novel method to extract from a web object its ''hyper'' informative content, in contrast with current search engines, which only deal with the ''textual'' informative content. This method is not only valuable per se, but it is shown to be able to considerably increase the precision of current search engines, Moreover, it integrates smoothly with existing search engines technology since it can be implemented on top of every search engine, acting as a post-processor, thus automatically transforming a search engine into its corresponding ''hyper'' version. We also show how, interestingly, the hyper information can be usefully employed to face the search engines persuasion problem. (C) 1997 Published by Elsevier Science B.V.
Article
The Internet Softbot (software robot) is a fully implemented AI agent developed at the University of Washington. It uses a Unix shell and the World-Wide Web to interact with a wide range of Internet resources. It uses a Unix shell and the World-Wide Web to interact with wide range of Internet resources. Effectors include ftp, telnet, mail, and numerous file manipulation commands: sensors include Internet facilities such as archie, gopher, netfind, and many more. The softbot is designed to incorporate new facilities into its repertoire as they become available.
Article
Migration is the movement of an active entity from one machine to another during execution. Such migration may be used for dynamic load balancing purposes with the aim of gaining increased performance from a group of processors than may be gained by schemes simply allocating processes to processors at run time. Schemes providing object migration also offer object persistence, improved fault tolerance and potentially more efficient remote object invocation (RPC). The survey covers systems providing process migration over both modified and unmodified UNIX and various experimental operating systems. Task migration over two modern microkernel-based operating systems is followed by a section on a number of object migration facilities with objects of varying granularity. 1 Introduction This report details a survey of systems providing process or object migration. Migration allows for processes to be moved from one machine to another during execution, hopefully with no loss of funct...
Article
Amalthaea is an evolving, multiagent ecosystem for personalized filtering, discovery and monitoring of information sites. Amalthaea's primary application domain is the WorldWide -Web and its main purpose is to assist its users in finding interesting information. Two different categories of agents are introduced in the system: filtering agents that model and monitor the interests of the user and discovery agents that model the information sources. A market-like ecosystem where the agents evolve, compete and collaborate is presented: agents that are useful to the user or other agents reproduce while lowperforming agents are destroyed. Results from various experiments with different system configurations and varying ratios of user interests versus agents in the system are presented. Finally issues like fine-tuning the initial parameters of the system and establishing and maintaining equilibria in the ecosystem are discussed. Keywords: Agents, Evolution, Information Filtering, W...
Article
Standard Web search services, though useful, are far from ideal. There are over a dozen different search services currently in existence, each with a unique interface and a database covering a different portion of the Web. As a result, users are forced to repeatedly try and retry their queries across different services. Furthermore, the services return many responses that are irrelevant, outdated, or unavailable, forcing the user to manually sift through the responses searching for useful information. This paper presents the MetaCrawler, a fielded Web service that represents the next level up in the information "food chain." The MetaCrawler provides a single, central interface for Web document searching. Upon receiving a query, the MetaCrawler posts the query to multiple search services in parallel, collates the returned references, and loads those references to verify their existence and to ensure that they contain relevant information. The MetaCrawler is sufficiently lightweight to reside on a user's machine, which facilitates customization, privacy, sophisticated filtering of references, and more. The MetaCrawler also serves as a tool for comparison of diverse search services. Using the MetaCrawler's data, we present a "Consumer Reports" evaluation of six Web search services: Galaxy[5], InfoSeek[1], Lycos[15], Open Text[20], WebCrawler[22], and Yahoo[9]. In addition, we also report on the most commonly submitted queries to the MetaCrawler. Keywords: MetaCrawler, WWW, World Wide Web, search, multi-service, multi-threaded, parallel, comparison This paper appears in the Proceedings of the 1995 World Wide Web Conference 1 1
Query By Images Content Home Page
  • Ibm
  • Inc
IBM, Inc. Query By Images Content Home Page. http://wwwqbic.almaden.ibm.com/~qbic/qbic.html.
The Virtual Tourist Home Page
  • Brandon Plewe
Brandon Plewe, The Virtual Tourist Home Page. http://wings.buffalo.edu/world.
WWW Home Pages Harvest Broker
  • Michael Schwartz
Michael Schwartz etc. al., WWW Home Pages Harvest Broker. http://town.hall.org/Harvest/broker/www-home-pages/.
Orbix Programming's Guide
" Orbix Programming's Guide ", IONA Technologies Ltd., Novermber 1994.
  • K Brockschmidt
K. Brockschmidt, " Inside OLE, " 2 nd, ed., Microsoft Press, Redmond, Washington (1995).
Home Pages Harvest Broker
  • M Schwartz