Conference Paper

Techniques for achieving high performance Web servers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With increasing bandwidth available to the client and the number of users growing at an exponential rate the Web server can become a performance bottleneck. This paper considers the parallelization of requests to Web pages each of which is composed of a number of embedded objects. The performance of systems in which the embedded objects are distributed across multiple backend servers are analyzed. Parallelization of Web requests gives rise to a significant improvement in performance. Replication of servers is observed to be beneficial especially when the embedded objects in a Web page are not evenly distributed across servers. Load balancing policies used by the dispatcher of Web page requests are investigated. A simple round robin policy for backend server selection gives a better performance compared to the default random policy used by the Apache server

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Second, we use the mathematical software package QNAT (queue network analysis tool) [6][7][8] to conduct simulation. In addition, we propose a mathematical model based on a queuing model to compute either the web delay or the average system response time E(t). ...
... Therefore DNS-based TTL (time-to-live) can be used to determine how much time will be required to find an intermediate name server; otherwise the DNS cluster cannot select a suitable web server. This approach is a good idea, but the cluster DNS server often causes additional delays, as too much traffic passes through the intermediate name server [1][2][3][4][5][6][7]. ...
... There are three cases: without network delay, with network delay, and with a variation feedback and network delay. Among the three we find a way to locate the load balancing by using the friendly queuing simulation tool QNAT [1,[6][7][8]. ...
Article
Full-text available
Based on our survey of recent articles, there is little research being conducted into quantitative analysis of the load balancing of Web server clusters. In this paper, we propose a quantitative analysis for DNS-based server clusters. We also propose a two-pass load-balancing method for determining the load balance area of these clusters. The first pass uses the lookup table instead of a complicated computation for obtaining the load balancing of the Web service requests. The second pass also utilizes a lookup table by a precomputed Hessian matrix to obtain the load balancing. In addition, we compare the relative performance of dispatcher-based and DNS-based server clusters using queuing theory, analysis, and simulation, and we compare the measurement results using benchmarks. To increase the simulation performance we have designed a simulation module to promptly locate the load balancing, with a potential improvement of 36.58% over the average system response time.
... Two kinds of granularity are proposed to distribute data in the architecture, namely, basic component of web pages and sets of web pages. Instead of storing complete web pages in distributed servers, [3, 19, 20] propose to split web pages into several weblets that are components of web pages and can be executed independently to generate data in pages. When generating web pages, only weblets need to be recomputed, static parts of pages can be fetched from caches directly. ...
Article
Full-text available
As enterprises worldwide race to improve real-time management to improve productivity, customer services and flexibility, huge resources have been invested into enterprise systems (ESs). All modern ESs adopt an n-tier client-server architecture, which includes several application servers to hold users and applications. As in any other multi-server environment, the load distributions, and user distributions in particular, become a critical issue in tuning system performance. In stateful ESs, a user who logs onto an application server and stays connected to the server for an entire working session, which can last for days, evokes each application. Therefore, admitting a user onto an application server affects not only current but also future performance of that server. Although the n-tier architecture may involve web servers, there is little in the literature in Distributed Web Server Architectures that considers the effects of distributing users instead of individual requests to servers. The algorithm proposed in this paper gives specific suggestions in user distributions and the minimal number of servers required based on the application reusability threshold. A heuristic version of the algorithm is also presented to improve the performance. The paper also discusses how to apply association rules to predict new user behavior when distributing users in the run-time. The distributions recommended by the algorithms are compared against the Round-Robin distributions on a set of real data derived from a mid size company. The result shows that the user distributions suggested have better performance than Round Robin distributions.
... The decision to choose the request route from a client to the target server can be done at several places. The four possible ways for this routing decision of the client requests to the target server are [NM00]: (i) Web client-based, (ii) DNSbased, (iii) dispatcher-based, and (iv) server-based. In the first approach, the client, who originates the request, is responsible for request routing. ...
... Many experiments concerning performance testing has been performed to uncover which algorithm that is most efficient when load balancing web traffic [21,18,30,11,28,1,31,3,23,4,15]. These experiments either use a trace driven or program generated traffic. ...
Article
Abstract As web,pages become,more,user friendly and interactive we see that objects such as pictures, media files, cgi scripts and databases are more frequently used. This development,causes increased,stress on the servers due to intensi- fied cpu usage and a growing,need for bandwidth,to serve the content. At the same,time users expect low latency and high availability. This dilemma,can be solved by implementing,load balancing,between,servers serving content to the clients. Load balancing,can provide,high availability through,redundant server solutions, and reduce latency by dividing load. This paper describes a comparative,study of different load balancing,algo- rithms,used,to distribute packets,among,a set of equal web,servers serving HTTP content. For packet redirection, a Nortel Application Switch 2208 will be used, and the servers will be hosted on 6 IBM bladeservers. We will com- pare three different algorithms: Round Robin, Least Connected and Response Time. We will look at properties such as response time, traffic intensity and type. How,will these algorithms,perform,when,these variables change,with time. If we can find correlations between,traffic intensity and efficiency of the algorithms, we might be able to deduce a theoretical suggestion on how to create an adaptive,load balancing,scheme,that uses current traffic intensity to
... With the Internet rush, many researches have been devoted to distributing user requests with Distributed Web Server Architecture to improve the performance of web servers. Depending on the locations where request distributions happen, these researches are classified into client-based, DNS (Domain Name Server)based, dispatcher-based, and server-based by [5,4,14,15]. Since current Http protocol is stateless, each request is routed independently to a web server [4]. ...
Conference Paper
Full-text available
As enterprises world-wide racing to embrace real-time management to improve productivities, customer services, and flexibility, many resources have been invested in enterprise systems (ESs).All modern ESs adopt an n-tier client-server architecture that includes several application servers to hold users and applications. As in any other multi-server environments, the load distributions, and user distributions in particular, becomes a critical issue in tuning system performances. Although n-tier architecture may involve web servers, no literatures in Distributed Web Server Architectures have considered the effects of distributing users instead of individual requests to servers. The algorithm proposed in this paper return specific suggestions, including explicit user distributions, the number of servers needed, the similarity of user requests in each server. The paper also discusses how to apply the knowledge of past patterns to allocate new users, who have no request patterns, in a hybrid dispatching program.
... With regard to distributing loads, the Random algorithm selects links or servers randomly regardless of the borne loads on each line, leaving the system unbalanced. Its performance of load balancing is worse than RR [8]. RR algorithm, first used in network problems by Nagle [9], is the simplest one which links to serve the system simply in turn. ...
Article
This study presents a new method [In–Out Combined Dynamic Weighted Round-Robin (CDWRR)] for balancing network loads in multiple link environments that provide both input and output interfaces. The purpose of this research is to balance loads for reducing Internet traffic jams. Using the novel line detection prediction approach, the main benefit of the presented algorithm is that it does not need to trace the system load every instant but achieves a far better load balancing than the traditional Round-Robin does. In addition, CDWRR keeps multiple links balanced in both input and output links. Mathematical equations are developed for predicting the best time instant of detecting, and both theoretical and practical approaches are provided for performance comparisons. Results obtained from the simulations show that CDWRR is effective and is efficient in maintaining load balance for multiple input and output network structures.
Article
Based on our survey of recent articles, there is little research being conducted into quantitative analysis of the load balancing of Web server clusters. In this paper, we propose a quantitative analysis for DNS-based server clusters. We also propose a two-pass load-balancing method for determining the load balance area of these clusters. The first pass uses the lookup table instead of a complicated computation for obtaining the load balancing of the Web service requests. The second pass also utilizes a lookup table by a precomputed Hessian matrix to obtain the load balancing. In addition, we compare the relative performance of dispatcher-based and DNS-based server clusters using queuing theory, analysis, and simulation, and we compare the measurement results using benchmarks. To increase the simulation performance we have designed a simulation module to promptly locate the load balancing, with a potential improvement of 36.58% over the average system response time.
Article
With the advent of the cloud computing, web servers, as the major channel in cloud computing, need to be redesigned to meet performance and power constraints. Considerable efforts have been invested in distributed web servers and web caching with dijJerent optimizing strategies, but few existing studies have been directly focused on improving the web server itself, not to mention complete hardware-favored web services. In this paper, we propose a novel architecture of web server and implement it on FPGA. After taking challeng with significant difficulties in design and implementation, we manage to complete an evaluation system which confirms that hardware-favored architecture brings higher throughput, lower power consumption as well as stand-alone web service functionalities due to direct pipelining execution of web
Article
As enterprises worldwide race to embrace realtime management to improve productivity, customer services and flexibility, large amount of resources have been invested in Enterprise Systems (ESs). As comprehensive feature of these modern systems, they utilize a n-tier client-server architecture that includes several application servers to serve users and host applications. The load and user distributions become a critical issue in performance tuning of these enterprise systems, as any other multi-server environments. This paper proposes an algorithm to distribute users by evoking similar transactions to same servers, which have limited buffer sizes. The number of transactions can be hosted in each server is constrained by the buffer size multiplied by a factor specified by system administrators. Based on user profiles, the algorithm return suggestions of user distributions, the number of servers needed, and similar user requests in each server. In addition, it discusses how to apply the knowledge of existing user patterns to distribute new users, who do not have enough entries in the profile and have no distribution suggestion during run-time.
Article
The purpose of this paper is to determine the values of the parameters in a new method (Dynamic Weighted Round-Robin, DWRR) developed for solving Internet traffic jam problems. Using the traditional Round-Robin (RR) method as a base, DWRR was developed to efficiently control loads in a multiple-link network. Unlike least-load algorithm, DWRR does not need to trace system loads continually, but achieves a far better load balancing than RR does. Mathematical functions are developed for predicting the optimal time interval of detection of line loads in this method, while the concept of variance in statistics is used as the criterion for evaluating the load balance level. A couple of related coefficients have also been determined by analyzing the simulation data. A centralized gateway with a multi-link-load-balancer is modeled for explaining the proposed algorithm. In addition, both theoretical and practical approaches are provided in this paper, along with performance comparisons between them. The results obtained from the computational experiments show that DWRR achieves a superior network loads balancing.
Conference Paper
Architectural change heuristics are a very powerful mechanism for implementing architectural optimisation. They allow for both the capture of the systematic changes required to maintain system integrity and the often poorly understood rationale of expert knowledge. However, even though heuristics are one of the oldest and most widely used problem-solving mechanisms, they are also perhaps one of the most mis-used and ill-defined. In order to understand how heuristics can be used in optimising system architectures it is important to understand the nature of heuristics especially as they apply to architectural optimisation. This paper presents a framework that can be used to categorise and classify heuristics as they are used in system optimisation. It is anticipated that this framework will provide a common foundation within which to discuss heuristics in architectural optimisation
Article
Full-text available
A distributed multiserver Web site can provide the scalability necessary to keep up with growing client demand at popular sites. Load balancing of these distributed Web-server systems, consisting of multiple, homogeneous Web servers for document retrieval and a Domain Name Server (DNS) for address resolution, opens interesting new problems. In this paper, we investigate the effects of using a more active DNS which, as an atypical centralized scheduler, applies some scheduling strategy in routing the requests to the most suitable Web server. Unlike traditional parallel/distributed systems in which a centralized scheduler has full control of the system, the DNS controls only a very small fraction of the requests reaching the multiserver Web site. This peculiarity, especially in the presence of highly skewed load, makes it very difficult to achieve acceptable load balancing and avoid overloading some Web servers. This paper adapts traditional scheduling algorithms to the DNS, proposes new policies, and examines their impact under different scenarios. Extensive simulation results show the advantage of strategies that make scheduling decisions on the basis of the domain that originates the client requests and limited server state information (e.g., whether a server is overloaded or not). An initially unexpected result is that using detailed server information, especially based on history, does not seem useful in predicting the future load and can often lead to degraded performance
Article
The HTTP/1.1 protocol is the result of four years of discussion and debate among a broad group of Web researchers and developers. It improves upon its phenomenally successful predecessor, HTTP/1.0, in numerous ways. We discuss the differences between HTTP/1.0 and HTTP/1.1, as well as some of the rationale behind these changes.
Article
The popularity of the Internet, and the usage of the world wide web in particular, has grown rapidly in recent years. Thousands of companies have deployed Web servers and their usage rates have increased dramatically. Our research has focused on measuring, analyzing and evaluating the performance of Internet and Intranet Web servers with a goal of creating capacity planning models. We have created layered queuing models (LQMs) and demonstrated their superiority to traditional queuing network models since they incorporate layered resource demands. Along the way we built a tool framework that enables us to collect and analyze the empirical data necessary to accomplish our goals.This paper describes the custom instrumentation we developed and deployed to collect workload metrics and model parameters from large-scale, commercial Internet and Intranet Web servers. We discuss the measurement issues pertaining to model parametrization and validation. We describe an object-oriented tool framework that significantly improves the productivity of analyzing the nearly 100 GBs of measurements collected during this workload study interval. Finally, we describe the LQM we developed to estimate client response time at a Web server. The model predicts the impact on server and client response times as a function of network topology and Web server pool size. We also use it to consider the consequences of server system configuration changes such as decreasing the HTTP object cache size.
Conference Paper
Under high loads, a Web server may be servicing manyhundreds of connections concurrently. In traditional
Conference Paper
The phenomenal growth in popularity of the World Wide Web (WWW, or the Web) has made WWW traffic the largest contributor to packet and byte traffic on the NSFNET backbone. This growth has triggered recent research aimed at reducing the volume of network traffic produced by Web clients and servers, by using caching, and reducing the latency for WWW users, by using improved protocols for Web interaction. Fundamental to the goal of improving WWW performance is an understanding of WWW workloads. This paper presents a workload characterization study for Internet Web servers. Six different data sets are used in this study: three from academic (i.e., university) environments, two from scientific research organizations, and one from a commercial Internet provider. These data sets represent three different orders of magnitude in server activity, and two different orders of magnitude in time duration, ranging from one week of activity to one year of activity. Throughout the study, emphasis is placed on finding workload invariants: observations that apply across all the data sets studied. Ten invariants are identified. These invariants are deemed important since they (potentially) represent universal truths for all Internet Web servers. The paper concludes with a discussion of caching and performance issues, using the invariants to suggest performance enhancements that seem most promising for Internet Web servers.
Article
Popular Web sites cannot rely on a single powerful server nor on independent mirrored-servers to support the ever-increasing request load. Distributed Web server architectures that transparently schedule client requests offer a way to meet dynamic scalability and availability requirements. The authors review the state of the art in load balancing techniques on distributed Web-server systems, and analyze the efficiencies and limitations of the various approaches
Article
This paper presents a workload characterization study for Internet Web servers. Six different data sets are used in the study: three from academic environments, two from scientific research organizations, and one from a commercial Internet provider. These data sets represent three different orders of magnitude in server activity, and two different orders of magnitude in time duration, ranging from one week of activity to one year. The workload characterization focuses on the document type distribution, the document size distribution, the document referencing behavior, and the geographic distribution of server requests. Throughout the study, emphasis is placed on finding workload characteristics that are common to all the data sets studied. Ten such characteristics are identified. The paper concludes with a discussion of caching and performance issues, using the observed workload characteristics to suggest performance enhancements that seem promising for Internet Web servers
Article
One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution# 2) request size distribution# 3) relative file popularity# 4) embedded file references# 5) temporal locality of reference# and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.
Article
With the increasing popularity of the World Wide Web, the amount of information available and the use of Web servers are growing exponentially. In order to reduce the overhead induced by frequent requests to the same documents by local users, client caching and, more generally, proxy-caching have been proposed and are now widely used. Most implementations use traditional memory paging policies, like the Least Recently Used (LRU) policy. However, due to the heterogeneity of the requests of Web traOEc, both in the size of the documents and in the network transfer delays, such caching policies are not very eOEcient. In this work, we propose a new caching policy which takes into account the network latency, the size of the documents, their access frequencies, and the time elapsed since the last reference to documents in the cache. Through trace-driven simulations and for various standard cost criteria (request hit rate, byte hit rate and latency ratio) we show that our policy performs bett...
Article
This paper describes httperf, a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macro-level benchmarks. The three distinguishing characteristics of httperf are its robustness, which includes the ability to generate and sustain server overload, support for the HTTP/1.1 protocol, and its extensibility to new workload generators and performance measurements. In addition to reporting on the design and implementation of httperf this paper also discusses some of the experiences and insights gained while realizing this tool. 1 Introduction A web system consists of a web server, a number of clients, and a network that connects the clients to the server. The protocol used to communicate between the client and server is HTTP...
Key differences between http/l .O and http/l. 1 , " International WWW Confer-ence Capacity planning for Web performance httperf -A tool to measure Web server performance
  • R Engelschall
  • B Krishnamurthy
  • J Mogul
  • D Kristol
  • D Menasce
  • V Almeida
R. Engelschall, " Load Balancing your Web site, " Web Techniques Magazine, Vol. 3, Issue 5, May 1998. [ 101 B. Krishnamurthy, J. Mogul, D. Kristol, " Key differences between http/l.O and http/l. 1, " International WWW Confer-ence, Toronto, May 1999, pp. 659-673. [ 111 D. Menasce, and V. Almeida, " Capacity planning for Web performance, " Prentice Hall, June 1998. [I21 D. Mosberger, T. Jin, " httperf -A tool to measure Web server performance, " In USENIX Symposium on Internet Technologies and Systems, December 1997, pp. 59-67. 1131 S. Nadimpalli, " ---, " M. Eng Thesis, Department of Sys-tems and Computer Engineering, Carleton Univeristy, 2000 (to appear).
Web server per-formance measurement and modeling Techniques Cache consistency in the HTTPII.1 proposed standard
  • J Dilley
  • R Friedrich
  • T Jin
  • J A Roliasi
  • Dingle
J. Dilley, R. Friedrich, T. Jin, J. Rolia, " Web server per-formance measurement and modeling Techniques, " Perform-ance Evaluation Joumal, 1998. [SI A. Dingle, " Cache consistency in the HTTPII.1 proposed standard, " In Proceedings of the ICM Workshop on Web Caching, September 1996.
Frangioso and M. Harchol-Balter Connection scheduling in Web servers
  • M Crovella
Almeida Capacity planning for Web performance
  • D Menasce
Load Balancing your Web site
  • R Engelschall
A new efficient caching policy for the World Wide Web
  • N Niclausse
  • Z Liu
  • P Nain