ArticlePDF Available

A fragment-based approach for efficiently creating dynamic Web content

Authors:

Abstract and Figures

This article presents a publishing system for efficiently creating dynamic Web content. Complex Web pages are constructed from simpler fragments. Fragments may recursively embed other fragments. Relationships between Web pages and fragments are represented by object dependence graphs. We present algorithms for efficiently detecting and updating Web pages affected after one or more fragments change. We also present algorithms for publishing sets of Web pages consistently; different algorithms are used depending upon the consistency requirements.Our publishing system provides an easy method for Web site designers to specify and modify inclusion relationships among Web pages and fragments. Users can update content on multiple Web pages by modifying a template. The system then automatically updates all Web pages affected by the change. Our system accommodates both content that must be proofread before publication and is typically from humans as well as content that has to be published immediately and is typically from automated feeds.We discuss some of our experiences with real deployments of our system as well as its performance. We also quantitatively present characteristics of fragments used at a major deployment of our publishing system including fragment sizes, update frequencies, and inclusion relationships.
Content may be subject to copyright.
A preview of the PDF is not available
... Authors of [10] define an Object Dependence Graph (ODG) to model web pages that are created by aggregation of contents. An ODG is a Directed Acyclic Graph (DAG) where pure and aggregated content elements are represented by vertices (nodes). ...
... These states (joined and split) indicate if the server has to serve the fragments together or independently. We have extended the fragment web pages representation presented in [10] in order to cover all our modelling requirements. ...
Article
Full-text available
Web cache performance has been reduced in Web 2.0 applications due to the increase of the content update rates and the higher number of personalized web pages. This problem can be minimized by the caching of content fragments instead of complete web pages. We propose a classification algorithm to define the fragment design that experiences the best performance. To create the algorithm, we have mined data of content characterization, user behaviour and performance. We have obtained two classification tree as result of this process. These classification trees are used to determine the fragment design. We have optimized the model of a real web site using both classification trees and we have evaluated the user observed response time. We have obtained significant results which prove that the optimization of the fragment designs can achieve high speedups in the user perceived response time.
... The nodes CE1, CE2 and CE3 correspond to the three root elements of the web pages. If we consider the connected subgraphs using these root elements as starting points, we obtain the content units of the three web pages: The use of this type of representation for the model of content units of web pages was suggested in [6] and they called it as Object Dependence Graph (ODG). Several research studies have used this type of models to represent the web pages or to solve their problematic [7]. ...
Conference Paper
Full-text available
This paper describes a JMeter extension created to allow for web structure mining process. This type of mining process is very important, for example, in the field of web performance engineering and web development validations. The extension allows users to define the HTML tags that wrap the content units and it creates a graph model to represent the content units of all the web pages and the relationships among them (aggregations). The usability of the extension has been validated in a real scenario.
... As such, replicating the data everywhere is not suited for applications with a high percentage of data updates. The CDN Akamai, for example, tackles this problem by enabling fragment caching [6]: the responses for popular requests are cached, which means that the dynamic document need not be regenerated but is simply retrieved from the cache. This mechanism is suitable for requests that do not modify the application data and are not unique, for example, a request for the local weather is time and location dependent. ...
Article
Replication in the World-Wide Web covers a wide range of techniques. Often, the redirec-tion of a client browser towards a given replica of a Web page is performed after the client's request has reached the Web server storing the requested page. As an alternative, we propose to perform the redirection as close to the client as possible in a fully distributed and trans-parent manner. Distributed redirection ensures that we find a replica wherever it is stored and that the closest possible replica is always found first. By exploiting locality, we can keep latency low.
... The concept of db-page fragments was also adopted in[7] for db-page content generation and management.3 To date, several TF/IDF variants and some other metrics, e.g., pagerank[21], are used to determine documents' weights. ...
Conference Paper
Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stands for Db-pAge Search, to support db-page search. Dash determines db-pages possibly generated by a target web application and its database through exploring the application code and the related database content and supports keyword search on those db-pages. In this paper, we present its system design and focus on the efficiency issue. To minimize costs incurred for collecting, maintaining, indexing and searching a massive number of db-pages that possibly have overlapped contents, Dash derives and indexes db-page fragments in place of db-pages. Each db-page fragment carries a disjointed part of a db-page. To efficiently compute and index db-page fragments from huge datasets, Dash is equipped with MapReduce based algorithms for database crawling and db-page fragment indexing. Besides, Dash has a top-k search algorithm that can efficiently assemble db-page fragments into db-pages relevant to search keywords and return the k most relevant ones. The performance of Dash is evaluated via extensive experimentation.
Article
Segregating the web page content into logical chunks is one of the popular techniques for modular organization of web page. While chunk-based approach works well for public web scenarios, in case of mobile-first personalization cases, chunking strategy would not be as effective for performance optimization due to dynamic nature of the Web content and due to the nature of content granularity. In this paper, the authors propose a novel framework Micro chunk based Web Delivery Framework which proposes and uses a novel concept of "micro chunk". The micro chunk based Web Delivery framework aims to address the performance challenges posed by regular chunk in a personalized web scenario. The authors will look at the methods for creating micro chunk and they will discuss the advantages of micro chunk when compared to a regular chunk for a personalized mobile web scenario. They have created a prototype application implementing the Micro chunk based Web Delivery Framework and benchmarked it against a regular personalized web application to quantify the performance improvements achieved by micro chunk design.
Article
Dividing the web site page content or web portal page into logical chunks is one of the prominent methods for better management of web site content and for improving web site's performance. While this works well for public web page scenarios, personalized pages have challenges with dynamic data, data caching, privacy and security concerns which pose challenges in creating and caching content chunks. Web portals has huge dependence on personalized data. In this paper the authors have introduced a novel concept called “personalized content chunk” and “personalized content spot” that can be used for segregating and efficiently managing the personalized web scenarios. The authors' experiments show that performance can be improved by 30% due to the personalized content chunk framework.
Article
In this paper, we propose a novel framework, called CATER, for the automated layout of transactional pages. To automatically generate transactional pages, CATER employs a decision-table to solve the problems of what kinds of widgets are suitable for different data items and how to arrange them in transactional pages. To improve the usability, CA TER dynamically gathers users' patterns for the optimization of the layout of transactional page. CATER also addresses the issues of data validation, persistent store, navigation control as well as maintenance. Thus it could be used for general Web applications. From the perspective of designers, CATER as a feasible automated layout solution could effectively facilitate the development process of Web applications, avoiding numerous mechanical manual labors. From the standpoint of users, this framework could effectively improve the usability of transactional pages, reducing the duration of completing a business process online.
Article
Group communication primitives have long been among the tools used to facilitate database replica-tion, and publish / subscribe systems are a natural mechanism to denominate update notifications for the consistency management of database replicas. However, the subscription and publication languages of current publish / subscribe systems do not efficiently capture the complex relationship between a database update and the data that update affects. In this thesis we specialize the publish / subscribe mechanism to meet the requirements of web database caches. To do so, we exploit foreknowledge of a web application's code and embedded database requests to efficiently support consistency management for typical database workloads. We compare the performance of our system to that obtainable by similar publish / subscribe systems that do not assume such foreknowledge, and examine how our system performs in an environment where the database schema and web application may evolve over time.
Conference Paper
With the rapid development of Web, personalized and dynamic web pages have increasing dominated current-day WWW traffic. It is an urgent problem to solve that how to save bandwidth and reduce latency for the users. Although the whole dynamic pages can not be cached, web pages from the same web site tend to contain many of the same fragments. Fragment-based caching is an effective solution for the delivery of dynamic web pages, however, good methods are needed for dividing web pages into fragments. Manual markup of fragments in dynamic web pages is labor-intensive, error-prone, and unscalable. This paper proposes a model for efficient delivery of dynamic web pages with automatic detection of shared fragments. This model can automatically detect the shared fragments in large collections of web pages. Experimental results show that the model can reduce more latency and save more bandwidth efficiently.
Conference Paper
Full-text available
Edge-Side Includes (ESI) is an open mark-up lan- guage that allows content providers to break their pages into fragments with individual caching characteristics. A page is reassembled from ESI fragments by a con- tent delivery network (CDN) at an edge server, which se- lectively downloads from the origin content server only those fragments that are necessary (as opposed to the entire page). This is expected to reduce the load and bandwidth requirements of the content server. This paper proposes an ESI-compliant approach in which page reconstruction occurs at the browser rather than the CDN. Unlike page assembly at the network edge, CSI optimizes content delivery over the last mile, which is where the true bottleneck often is. We call the client-based approach Client-Side Includes ,o r CSI.
Conference Paper
Full-text available
As Internet traffic continues to grow and web sites become increasingly complex, performance and scalability are major issues for web sites. Web sites are increasingly relying on dynamic content generation applications to provide web site visitors with dynamic, interactive, and personalized experiences. However, dynamic content generation comes at a cost --- each request requires computation as well as communication across multiple components.To address these issues, various dynamic content caching approaches have been proposed. Proxy-based caching approaches store content at various locations outside the site infrastructure and can improve Web site performance by reducing content generation delays, firewall processing delays, and bandwidth requirements. However, existing proxy-based caching approaches either (a) cache at the page level, which does not guarantee that correct pages are served and provides very limited reusability, or (b) cache at the fragment level, which requires the use of pre-defined page layouts. To address these issues, several back end caching approaches have been proposed, including query result caching and fragment level caching. While back end approaches guarantee the correctness of results and offer the advantages of fine-grained caching, they neither address firewall delays nor reduce bandwidth requirements.In this paper, we present an approach and an implementation of a dynamic proxy caching technique which combines the benefits of both proxy-based and back end caching approaches, yet does not suffer from their above-mentioned limitations. Our dynamic proxy caching technique allows granular, proxy-based caching where both the content and layout can be dynamic. Our analysis of the performance of our approach indicates that it is capable of providing significant reductions in bandwidth. We have also deployed our proposed dynamic proxy caching technique at a major financial institution. The results of this implementation indicate that our technique is capable of providing order-of-magnitude reductions in bandwidth and response times in real-world dynamic Web applications.
Article
Full-text available
In this paper we develop a general methodology for characterizing the access patterns of Web server requests based on a time‐series analysis of finite collections of observed data from real systems. Our approach is used together with the access logs from the IBM Web site for the Olympic Games to demonstrate some of its advantages over previous methods and to construct a particular class of benchmarks for large‐scale heavily‐accessed Web server environments. We then apply an instance of this class of benchmarks to analyze aspects of large‐scale Web server performance, demonstrating some additional problems with methods commonly used to evaluate Web server performance at different request traffic intensities.
Article
This paper presents a novel Web object man- agement mechanism called MONARCH. The primary goal of our mechanism is to pro- vide strong cache consistency without re- quiring servers to maintain per-client state. MONARCH also seeks to reduce overhead in- curred by heuristic-based cache consistency mechanisms.
Article
The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation and management of the site's structure, and the visual presentation of the site's pages. First, the site builder creates a uniform model of all data available at the site. Second, the builder uses this model to declaratively define the Web site's structure by applying a “site-definition query” to the underlying data. The result of evaluating this query is a “site graph”, which represents both the site's content and structure. Third, the builder specifies the visual presentation of pages in Strudel's HTML-template language. The data model underlying Strudel is a semi-structured model of labeled directed graphs. We describe Strudel's key characteristics, report on our experiences using Strudel, and present the technical problems that arose from our experience. We describe our experience constructing several Web sites with Strudel and discuss the impact of potential users' requirements on Strudel's design. We address two main questions: (1) when does a declarative specification of site structure provide significant benefits, and (2) what are the main advantages provided by the semi-structured data model.
Article
The quantitative results presented in our SIGCOMM '97 paper [1] include numerous minor errors. These errors were caused by programming bugs that led to faulty analyses and simulations, and by inaccurate transcriptions during the preparation of the paper. Here we present corrected figures and tables, as well as corrections to values that appeared in the text of the original paper. The effect of correcting the errors is to reduce the differences between the results based on the proxy trace and those based on the packet-level trace. Our overall conclusions are not significantly altered.
Article
The primary purpose of a programming language is to assist the programmer in the practice of her art. Each language is either designed for a class of problems or supports a different style of programming. In other words, a programming language turns the computer into a ‘virtual machine’ whose features and capabilities are unlimited. In this article, we illustrate these aspects through a language similar tologo. Programs are developed to draw geometric pictures using this language.