[Show abstract][Hide abstract] ABSTRACT: With the fast development of Internet, many recommender systems have emerged in e-commerce applications to support the product recommendation. However, most centralized recommender systems based on collaborative filtering can't work effectively when large number of users require to them. In P2P networks, this paper proposes a PRBOCF (Product Recommendation Based On Collaborative Filtering), which is a scalable mechanism to recommend products in distributed way. In PRBOCF, as two main parts of product information, image and text are weighted respectively and their features are represented by one vector. For increasing the quality of representing the text of product according to the Vector Space Model, WordNet v2.0 is employed to deal with the relationship of words in the text. Then a peer's preference is represented by a feature space consisting of all the vectors of its saved products information. For acquiring the recommender systems scalable and best quality of recommendation, PRBOCF makes product recommendation by searching for neighbor peers with similar preference through local information of recent ratings. Finally, the simulation results are discussed.
[Show abstract][Hide abstract] ABSTRACT: The Web Services technology has been initiated by industry, and lots of large companies are workingon Web Services technology and apply it in real world applications. However, the existing WebService registration and discovery are processed in a centralized approach, which cannot easily scale to a growing number of users. With the advantages of scalability and fault-tolerance, P2P technology is suitable for enhance the scalability of Web Services in nature and can also make it possible to discover them easily. Because the P2P networks can provide the framework for discovery, publication, and registration for Web Services, the central registry is not required when integrating Web Services. So, in this paper, we proposed a methodology, which consist of two parts, to integrate distributed Web Services in Gnutella-like networks, and it makes those services collaborative. One is publishing the information of Web Services to high degree nodes, and then many information chains of Web Service are formed in Gnutella. Another is combining high degree walk and random walk methods to discover the target Web Service. Finally we analyze the performance of the proposed methodology with simulation. The results show that the methodology can achieve high success rate, low traffic and load balance, so that it acquires the integration of Web Services scalable.
[Show abstract][Hide abstract] ABSTRACT: Topic-specific crawler aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. It is important for domain-specific resource discovery. Topic-specific crawlers yield good recall as well as good precision by restricting themselves to a specific domain from web pages. In this paper, we present an integrated topic-specific crawling strategy. The main features of the crawling process consist of a topic specification module that mediates between users and search engines to identify starting URLs by computing the hub score using BHIST algorithm, and a URL ordering algorithm that combines features of several previous approaches. Experimental results indicate that the new crawling method has better performance, and it was able to fetch higher topic relevant information.
[Show abstract][Hide abstract] ABSTRACT: The focused crawler of a special-purpose search engine aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. The PageRank algorithm is often used in ranking web pages, and it is also used in URL ordering for focused crawler. It estimates the page's authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we propose an improved PageRank algorithm, which we called "To-PageRank", and then we present a crawling strategy using the To-PageRank algorithm combining with the topic similarity of the hyperlink metadata. The experiment in focused crawler shows that the new improved crawling strategy has better performance than the Breath-first and PageRank algorithms.
[Show abstract][Hide abstract] ABSTRACT: Unstructured Peer-to-Peer networks, such as Gnutella, are popular for certain applications because they do not require centralized directories and precise control over network topology or data placement. However, the network topology and the placement of files in the unstructured P2P networks are largely unconstrained. Hence, it is important to apply efficient search algorithm for locating resource. Unstructured P2P topology has power-law characteristic in the link distribution, so this paper presents a replication-spread mechanism for resource location utilizing high degree nodes. Based on this spread mechanism, we proposed a novel search method which combined the high degree walk method and random walk method. Finally, the simulation results show that the method can achieve high success rates, reduce the search traffic, and also balance the load in the power-law networks.
[Show abstract][Hide abstract] ABSTRACT: The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Focused crawler is developed to collect relevant web pages of interested topics form the Internet. The PageRank algorithm is used in ranking web pages. It estimates the page 's authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we proposed an improved PageRank algorithm, which we called "T- PageRank", and it based on "topical random surfer". The experiment in focused crawler using the T-PageRank has better performance than the Breath-first and PageRank algorithms.
[Show abstract][Hide abstract] ABSTRACT: Gnutella is one of the prominent unstructured P2P networks. In the unstructured P2P networks, the network topology and the placement of files are largely unconstrained. The primitive search algorithm in Gnutella is flooding-based. The algorithm produces too much traffic though with limited search region which is constrained by TTL value. To improve search efficiency and reduce unnecessary traffic in Gnutella, this paper proposes an algorithm to unstructured P2P network, and it consists of ranked neighbor caching scheme and queryhit caching scheme. The proposed algorithm can extend the search region but reduce the search traffic, and also balance the network load, so that acquires the whole networks scalable.
[Show abstract][Hide abstract] ABSTRACT: We proposed a search algorithm to unstructured P2P network, which consists of ranked neighbor caching, queryhit caching, and file replication to free riders. And the simulation results show that the algorithm can extend the search region but reduce the search traffic, and also balance the network load, so that acquires the whole network scalable.