A preview of the PDF is not available
WEBTracker: A Web Crawler for Maximizing Bandwidth Utilization
Abstract and Figures
The most challenging part of a web crawler is to download contents at the fastest rate to utilize bandwidth and processing of the downloaded data so that it will never starve the downloader. Our implemented scalable web crawling system, named as WEBTracker has been designed to meet this challenge. It can be used very efficiently in the distributed environment to maximize downloading. WEBTracker has a Central Crawler Server and it administers all the crawler nodes. At each crawler node, Crawler Manager runs the downloader and manages the downloaded contents. Central Crawler Server and its Crawler Managers are members of the Distributed File System which ensures synchronized distributed operations of the system. In this paper, we have only concentrated on the architecture of a web crawling node which is owned by the Crawler Manager. We have shown that our implemented crawler architecture makes efficient use of allocated bandwidth, keeps the processor less busy for the processing of downloaded contents and makes efficient use of the run time memory.
Figures - uploaded by Md. Akter Hussain
All figure content in this area was uploaded by Md. Akter Hussain
Content may be subject to copyright.