A Publishing System for Efﬁciently Creating
Dynamic Web Content
Jim Challenger, Arun Iyengar, Karen Witting Cameron Ferstat, Paul Reed
IBM Research IBM Global Services
T.J. Watson Research Center 17 Skyline Drive
P.O. Box 704 Hawthorne, NY 10532
Yorktown Heights, NY 10598
Abstract—This paper presents a publishing system for efﬁciently creat-
ing dynamic Web content. Complex Web pages are constructed from sim-
pler fragments. Fragments may recursively embed other fragments. Re-
lationships between Web pages and fragments are represented by object
dependence graphs. We present algorithms for efﬁciently detecting and up-
dating Web pages affected after one or more fragments change. We also
present algorithms for publishing sets of Web pages consistently; different
algorithms are used depending upon the consistency requirements.
Our publishing system provides an easy method for Web site designers
to specify and modify inclusion relationships among Web pages and frag-
ments. Users can update content on multiple Web pages by modifying a
template. The system then automatically updates all Web pages affected by
the change. Oursystemaccommodatesbothcontent thatmustbe proofread
before publication and is typically from humans as well as content that has
to be published immediately and is typically from automated feeds.
Our system is being deployed at several popular Web sites including the
2000 Olympic Games Web site. We discuss some of our experiences with
real deployments of our system as well as its performance.
Many Web sites need to provide dynamic content. Examples
include sport sites , stock market sites, and virtual stores or
auction sites where information on available products is con-
There are several problems with providing dynamic data to
clientsefﬁciently and consistently. Akey problemwith dynamic
data is that it can be expensive to create; a typical dynamic page
may requireseveral ordersof magnitudemoreCPUtimeto serve
than a typical static page of comparable size. The overhead
for dynamic data is a major problem for Web sites which re-
ceive substantial request volumes. Signiﬁcant hardware may be
needed for such Web sites.
A key requirement for many Web sites providing dynamic
data is to completely and consistently update pages which have
changed. In other words, if a change to underlying data af-
fects multiplepages, all such pages shouldbe correctly updated.
In addition, a bundle of several changed pages may have to be
made visible to clients at the same time. For example, publish-
ing pages in bundles instead of individually may prevent situa-
tions where a client views a ﬁrst page, clicks on a hypertext link
to view a second page, and sees information on the second page
which is older and not consistent with the information on the
Depending upon the way in which dynamic data are being
served, achieving complete and consistent updates can be difﬁ-
cult or inefﬁcient. Many Web sites cache dynamic data in mem-
ory or a ﬁle system in order to reduce the overhead of recal-
culating Web pages every time they are requested . In these
systems, it is often difﬁcult to identify which cached pages are
affected by a change to underlying data which modiﬁes several
dynamic Web pages. In making sure that all obsolete data are
invalidated, deleting some current data from cache may be un-
avoidable. Consequently, cache miss rates after an update may
be high, adversely affecting performance. In addition, multiple
cache invalidations from a single update must be made consis-
This paper presents a system for efﬁciently and consistently
publishing dynamic Web content. In order to reduce the over-
head of generating dynamic pages from scratch, our system
composes dynamic pages from simpler entities known as frag-
ments. Fragments typically represent parts of Web pages which
change together;when a change tounderlyingdata occurs which
affects several Web pages, the fragments affected by the change
can easily be identiﬁed. It is possible for a fragment to recur-
sively embed another fragment.
Our system provides a user-friendly method for managing
complex Web pages composed of fragments. Users specify how
Web pages are composed from fragments by creating templates
in a markup language. Templates are parsed to determine in-
clusion relationships among fragments and Web pages. These
inclusion relationships are represented by a graph known as an
object dependence graph (ODG). Graph traversal algorithmsare
applied to ODG’s in order to determine how changes should be
propagated throughoutthe Web site after one or more fragments
Our system allows multiple independent authors to provide
content as well as multipleindependent proofreaders to approve
some pages for publication and reject others. Publication may
proceed in multiple stages in which a set of pages must be ap-
proved in one stage before it is passed to the next stage. Our
system can also include a link checker which veriﬁes that a Web
page has no broken hypertext links at the time the page is pub-
Akey feature of our systemis thatit is scalable to handle high
request rates. We are deploying our system at several popular
Web sites including the 2000 Olympic Games Web site.
The remainder of the paper is organized as follows. Section II
describes the architecture of our system in detail. Section III
discusses some of our experiences with deploying our system
at real Web sites. Section IV describes the performance of our
system. Section V discusses related work. Finally, Section VI
summarizes our main results and conclusions.
II. SYSTEM ARCHITECTURE
A. Constructing Web Pages from Fragments
A key feature of our system is that it composes complex Web
pages from simpler fragments (Figure 8). A page is a complete
entity which may be served to a client. We say that a fragment
or page is atomic if it doesn’t include any other fragments and
complex if it includesother fragments. An objectis either a page
or a fragment.
Our approach is efﬁcient because the overhead for compos-
ing an object from simpler fragments is usually minor. By con-
trast, the overhead for constructing the object from scratch as
an atomic fragment is generally much higher. Using the frag-
ment approach, it is possible to achieve signiﬁcant performance
improvements without caching dynamic pages and dealing with
the difﬁculties of keeping caches consistent. For optimal per-
formance, our system has the ability to cache dynamic pages.
Caching capabilities are integrated with fragment management.
The fragment-based approach for generating Web pages
makes it easier to design Web sites in additionto improvingper-
formance. It is easy to design a set of Web pages with a common
look and feel. It is also easy to embed common information into
several Web pages. Sets of Web pages containing similar infor-
mation can be managed together. For example, it is easy to up-
date common information represented by a single fragment but
embedded withinmultiplepages; inorder to updatethe common
informationeverywhere, onlythe fragment needs to be changed.
By contrast, if the Web pages are stored staticallyin a ﬁle sys-
tem, identifyingand updatingall pages affected by a change can
be difﬁcult. Once all changed pages have been identiﬁed, care
must be taken to update all changed pages in order to preserve
Dynamic Web pages which embed fragments are implicitly
updated any time an embedded fragment changes, so consis-
tency is automatically achieved. Consistency becomes an is-
sue with the fragment-based approach when the pages are being
published to a cache or ﬁle system. Our system provides several
differentmethodsforconsistentlypublishingWeb pages inthese
situations; each method provides a different level of consistency.
A.2 Object Dependence Graphs
When pages are constructed from fragments, it is important
to construct a fragment before any object containing is
constructed. In order to construct objects in an efﬁcient order,
our system represents relationshipsbetween fragments and Web
pages by graphs known as object dependence graphs (ODG’s)
(Figures 1 and 2).
Object dependence graphs may have several different edge
types. An inclusionedge indicates that an object embeds a frag-
ment. A link edge indicates that an object contains a hypertext
link to another object.
In the ODG in Figure2, all but one of the edges are inclusion
edges. For example, the edge from to indicates that
contains ; thus, when changes, shouldbe updated before
is updated. The graph resultingfrom only inclusion edges is
a directed acyclic graph.
Fig. 1. A set of Web pagescontaining fragments.
Fig. 2. The object dependencegraph (ODG) correspondingto Figure 1.
The edge from to is a link edge which indicates that
contains a hypertext link to A key reason for maintain-
ing link edges is to prevent dangling or inconsistent hypertext
links. In this example, the link edge from to indicates that
publishing before will result in a broken hypertext link.
Similarly, when both and change, publishing a current
version of before publishing a current version of could
present inconsistentinformation to clients who view an updated
version of click on the hypertext link to an outdated version
of and then see information which is obsolete relative to the
referring page. Link edges can formcycles withinan ODG. This
would occur, for example, if two pages both contain hypertext
links to each other.
There are two methods for creating and modifying ODG’s.
Using one approach, users specify how Web pages are com-
posed from fragments by creating templates in a markup lan-
guage. Templates are parsed to determine inclusion relation-
ships among fragments and Web pages. Using the second ap-
proach, a program may directly manipulate edges and vertices
of an ODG by using an API.
Our system allows an arbitrary number of edge types to exist
in ODG’s. So far, we have only foundpractical use forinclusion
and link edges. We suspect that there may be other types of
important relationships which can be represented by other edge
When our system becomes aware of changes to a set of
one or more objects, it does a depth-ﬁrst graph traversal using
topologicalsort todetermine allvertices reachable from by
following inclusion edges. The topological sort orders vertices
such that whenever there is an edge from a vertex to another
vertex , appears before inthe topologicalsort. For example,
a valid topologicalsort of the graph in Figure 2 after , ,and
change would be , , , , , , ,and .This
topological sort ignores link edges.
Objects are updated in an order consistent with the topologi-
cal sort. Our system updates objects in parallel when possible.
In the previous example, , ,and can be updated in par-
allel. After is updated, and may be updated in parallel.
A number of other objects may be constructed in parallel in a
manner consistent with the inclusion edges of the ODG.
After a set of pages, , has been updated (or generated for
the ﬁrst time), the pages in are published so that they can be
viewed by clients. In some cases, the pages are published to ﬁle
systems. In other cases, they are published to caches. Pages may
be published either locally on the system generating them or to
a remote system. It is often a requirement for a set of multiple
pages to be published consistently. Consistency can be guar-
anteed by publishing all changed (or newly generated) pages in
a single atomic action. One potential drawback to this method
of publication is that the publication process may be relatively
long. For example, pages may have to be proofread before pub-
lication. If everything is published together in a single atomic
action, there can be considerable delay before any information
is made available.
Therefore, incremental publication, wherein information is
published in stages instead of together, is often desirable. The
disadvantagetoincremental publicationisthat consistency guar-
antees are not as strong. Our system provides three different
methods for incremental publication, each providing different
levels of consistency.
The ﬁrst incremental publishing method guarantees that a
freshly published page will not contain a hypertext link to ei-
ther an obsolete or unpublished page. This consistency guar-
antee applies to pages reached by following several hypertext
links. More speciﬁcally, if and are two pages in if a
client views an updated version of and follows one or more
hypertext links to view , then the client is guaranteed to see a
version of which is not obsolete with respect to the version
of which the client viewed (a version of is obsolete with
respect to a version of if the version of was outdated at
the timethe version of became current, regardless of whether
or have any fragments in common).
For example, consider theWeb pages inFigure 3. A client can
access by starting at followinga hypertext link to and
then following a second hypertext to Suppose that both
and change. The ﬁrst incremental publishing method guar-
antees that the new version of will not be published before
the new version of regardless of whether has changed.
Fig. 3. A set of Web pages connectedby hypertextlinks.
This incremental publishing method is implemented by ﬁrst
determining the set of all pages which can be reached by fol-
lowinghypertext linksfrom a page in includes all pages of
; it mayalsoincludepreviouslypublishedpages whichhaven’t
changed. is determined by traversing link edges in reverse or-
der starting from pages in
Let be the subgraph of the ODG consisting of all nodes in
and link edges in the ODG connecting nodes in is topo-
logically sorted, and its strongly connected components are de-
termined. A strongly connected component of a directed graph
is a maximal subset of vertices such that every vertex in
has a directed path to every other vertex in A good algorithm
for ﬁndingstrongly connected components in directed graphs is
contained in .
Vertices in are then examined in an order consistent with
the topologicalsort of Each time a page in is examined for
which the updated version hasn’t been published yet, the page
is published together with all other pages in belonging to the
same strongly connected component. Each set of pages which
are published together in an atomic action is known as a bundle.
The second incremental publishing method guarantees that
any twopages in whichboth containa common changed frag-
ment are published in the same bundle. For example, consider
the Web pages in Figure 4. Suppose that both and change.
Since and both embed their updated versions must be
published together. Since and both embed their up-
dated versions must be published together. Thus, updated ver-
sions of all three Web pages must be published together. Note
that updated versions of and must be published together,
even though the two pages don’t embed a common fragment.
Fig. 4. A set of Web pages containing commonfragments.
In order to implement this approach, the set of all changed
fragments contained within each changed object is deter-
mined. We call this set the changed fragment set for and
denote it by All changed objects are constructed in topo-
logical sorting order. When a changed object is constructed,
is calculated as the union of and for each frag-
ment such that a dependence edge exists in the ODG.
After all changed fragment sets have been determined, an
undirected graph is constructed in which the vertices of
are pages in . An edge exists between two pages and in
if and have at least one fragment in common.
is examined to determine itsconnected components (two ver-
tices are part of the same connected component if and only if
there is a path between the vertices in the graph). All pages be-
longing to the same connected component are published in the
The third incremental publishing method satisﬁes the consis-
tency guarantees of both the ﬁrst and second method. In other
1. A freshly published page will not contain a hypertext link to
either an obsolete or unpublished page. More speciﬁcally, if
and are twopages in if a clientviews an updatedversionof
and follows one or more hypertext links to view , then the
client is guaranteed to see a version of which is not obsolete
with respect to the version of which the client viewed.
2. Any two changed pages which both contain a common
changed fragment are published together.
This method generally results in publishing fewer bundles but
of larger sizes than the ﬁrst two approaches.
For example, consider the Web pages in Figure 5. Suppose
that both and change. Updatedversions of and must
be published together because they both embed Since
contains a hypertext link to theupdatedversion of cannot
be published before the bundle containing updated versions of
Fig. 5. Anotherset of related Web pages.
If, instead, the ﬁrst incremental publishingmethod were used
to publish the Web pages in Figure 5, the updated version of
could not be published before the updated version of
However, the updated version of would not have to be pub-
lished in the same bundle as the updated version of If the
second incremental publishing method were used, updated ver-
sions of both and would have to be published together in
the same bundle. However, publication of the updated version
of would be allowed to precede publication of the bundle
containing updated versions of and
The third incremental publishing method is implemented by
constructing as in the ﬁrst incremental publishingmethod and
changed fragment sets as in the second incremental publishing
method. Additional edges are then added to between pages
in . For all pages and in such that and
have a fragment in common, directed edges from both to
and to are then added. The same procedureisthen applied
to to publish pages in bundles as in the ﬁrst method.
Incremental publishing methods can be designed for other
consistency requirements as well. For example, consider Fig-
ure 3. Suppose that both and change. It may be desirable
to publish updated versions of and in the same bundle.
This would avoid the following situation which could occur us-
ing the ﬁrst incremental publishing method.
Aclient views an old version of After followinghypertext
links, the client arrives at a new version of The browser’s
cache is then used to go to the old version of The client
reloads in order to obtain a version consistent with but
still sees the old version because the new version of has not
yet been published.
It is straightforward to implement an incremental publishing
method which would publish and in the same bundle us-
ing techniques similar to the ones just described.
B. The Publishing System
B.1 Combined Content Pages
Many Web sites contain information that is fed from multi-
ple sources. Some of the information, such as the latest scores
from a sportingevent, is generated automatically by a computer.
Other information, such as news stories, is generated by hu-
mans. Both types of information are subject to change. A page
containing both human and computer-generated information is
known as a combined content page.
A key problem with serving combined content pages is the
different rates at which sources produce content. Computer-
generated content tends to be produced at a relatively high rate,
often as fast as the most sophisticated timing technology per-
mits. Human-generated content is produced at a much lower
rate. Thus, it is difﬁcult for humans to keep pace with auto-
mated feeds. By the time an editor has ﬁnished with a page, the
actual results on the page may have changed. If the editor takes
time to update the page, the results may have changed yet again.
A requirement for many of the Web sites we have helped de-
sign is that computer-generated content should not be delayed
by humans. Computer-generated results, such as the latest re-
sults from a sporting event, are often extremely important and
should be published as soon as possible. If computer-generated
results are combined with human-edited content using conven-
tional Web publishing systems, publication of the computer-
generated results can be delayed signiﬁcantly. What is needed
is a scheme to combine data feeds of differing speeds so that
informationarriving at high rates is not unnecessarily delayed.
In order to provide combined content pages, our system di-
vides fragments into two categories. Immediate fragments are
fragments whichcontain vitalinformationwhichshould bepub-
lished quickly with minimal proofreading. For the sports Web
sites that our system is being used for, the latest results in a
sporting event would be published as an immediate fragment.
Qualitycontrolled fragments are fragments which don’t have to
be published as quickly as immediate fragments but have con-
tent which must be examined in order to determine whether the
fragments are suitable to be published. Background stories on
athletes are typically published as quality controlled fragments
at the sports sites which use our system. Combined content Web
pages consist of a mixture of immediate and quality controlled
When one or more immediate fragments change, the Web
pages affected by the changes are updated and published with-
outproofreading. If both immediate and qualitycontrolled frag-
ments change, the system ﬁrst performs updates resulting from
the immediate fragments and publishes the updated Web pages
immediately. It subsequently performs updates resulting from
quality controlled fragments and only publishes these updated
Web pages after they have been proofread. Multipleversions of
a combined content page may be published using this approach.
The ﬁrst version wouldbe the page before any updates. The sec-
ond version might contain updates to all immediate fragments
but not to any quality controlled fragments. The third version
might contain updates to all fragments.
It is possible for an update to an immediate fragment to
be published before an update to a quality controlled fragment
even though changed before This might occur if the
changes to are delayed in publication due to proofreading.
B.2 System Description
Web pages produced by our system typically consist of mul-
tiple fragments. Each fragment may originate from a different
source and may be produced at a different rate than other frag-
ments. Fragments may be nested, permittingthe constructionof
complex and sophisticated pages. Completed pages are written
to sinks, which may be ﬁle systems, Web server accelerators ,
or even other HTTP servers.
The Trigger Monitor is the software which takes objects from
one or more sources, constructs pages, and writes the con-
structed pages to one or more sinks (Figure 6). Relationships
between fragments are maintained in a persistent ODG which
preserves state information in the event of a system crash. Our
new Trigger Monitor has signiﬁcantly enhanced functionality
compared with the Trigger Monitor used for the 1998 Olympic
Games Web site , .
Fig. 6. Schematic of the Publish Process.
Whenever the Trigger Monitor is notiﬁed of a modiﬁcation,
addition, or deletion of one or more objects, it fetches new
copies of the changed objects from one or more sources. The
ODG is updated by parsing changed objects. The graph traver-
sal algorithms described in Section II-A.2 are then applied to
determine all Web pages which need to be updated and an ef-
ﬁcient order for updating them. Finally, bundles of published
pages are written to the sinks.
SincetheTriggerMonitorisaware ofall fragmentsand pages,
synchronization is possible to prevent corruption of the pages.
The ODG is used as the synchronization object to keep the
fragment space consistent. Many “trigger handlers”, each with
their own sources and sinks, may be conﬁgured to use a com-
mon ODG. This design permits, for example, a slow-moving,
carefully edited human-generated set of pages and fragments
to be integrated with a high-speed, automated, database-driven
content source. Because the ODG is aware of the entire frag-
ment space and the interrelationship of the objects within that
space, synchronization points can be chosen to ensure that mul-
tiple, differently-sourced, differently-paced content streams re-
MultipleTrigger Monitorinstances may be chained, the sinks
of earlier instances becoming the sources for later ones. This
allowspublicationtotakeplace in multiplestages. For example,
the publishing system for the 2000 Summer Olympic Games
Web site consists of the following stages (Figure 7):
Development is the ﬁrst step in the process. Fragments which
appear on many Web pages (such asgeneric headers and footers)
as well as overall site design occur here. The output of develop-
ment may be structurally complete but lacking in content.
Staging takes as its input, or source, the output, or sink, of De-
velopment. Editors polish pages and combine content from var-
ious sources. Finished pages are the result.
QualityAssurance takes as itssource the sink of Staging. Pages
are examined here for correctness and appropriateness.
Automated Results are produced when a database trigger is gen-
erated as the result of an update. The trigger causes programs
to be executed that extract current results and compose relevant
updated pages and fragments. Unlike the previous stages, no
human intervention occurs in this stage.
Production is where pages are served from. Its source is the
sink of QA, and its sinks are the serving directories and caches.
Note how one stage can use the sink of another stage as
its source. The automated feed updates each source at the
same time, but independently of the human-driven stages. This
achieves the dual goals of keeping the entire site consistent
while publishingcontent immediately from automated feeds.
A similar organization was used for the 1998 Winter Olympic
Games Web site. The primary difference was that the process of
moving pages from one stage to the next was purely manual. In
other words, authors had to keep track of all the pages that were
affected by their changes and move them down to Staging, edi-
tors had to move their material to Q/A, and so on. This required
the authors to know something about the editing process and the
editors to know about Q/A. Learning the process was difﬁcult
enough; changing it was even worse.
Our new system eliminatesmost of the procedural difﬁculties
which were experienced at the 1998 Olympic Games Web site.
Stages can be added and deleted easily. Data sources can be
added and deleted with little or no disruption to the ﬂow. The
new system adapts much more easily to changingconditionsand
requires peopleworkingon speciﬁc stages of the systemtoknow
less about what is required for other stages.
To demonstrate how a site might be built from fragments, we
present a real example from the ofﬁcial Web site for the 1999
French Open Tennis Tournament. A site architect views the
player page for Stefﬁ Graf (shown in Figure 8) as consisting of
a standard header, sidebar, and footer, with biographical infor-
mation and recent results thrown in. The site architect composes
HTML similar to the following,establishinga general layoutfor
Fig. 7. Schematic of the Publish Process.
<!-- %include(header.frg) -->
<td><!-- %include(sidebr.frg) --></td>
<tr><!-- %fragment(graf_bio.frg) --></tr>
<tr><!-- %fragment(graf_score.frg) --></tr>
<!-- %include(footer.frg) -->
where “footer.frg” consists of
<!-- %fragment(factoid.frg) -->
<!-- %fragment(copyr.frg) -->
Prior to the beginningof play, the contents of “graf score.frg”
will be empty, since no matches have commenced. This means
the part of the page outlined by the dashed box in Figure8 will,
at ﬁrst, be empty. The ﬁrst publication of this fragment will
result in the ODG seen to the right of Stefﬁ Graf’s player page
in Figure 8. Again, the objects and edges within the dashed box
will not yet be within the ODG, since no match play has yet
Using fragments in this way permits many architects, editors,
and even automated systems to modify the page simultaneously.
Our system ensures that all changes are properly included in
the ﬁnal page that is seen by the user. An architect updating
the structure of the page does not need to know anything about
copyrights,trademarks, the size of thesponsor’slogos, the look-
and-feel of the site, or any of the data that will be included on
the page. Similarly, an editor wishing to change the look-and-
feel of a site does not need to understand the structure of any
Major site changes, like changing the look-and-feel of a site,
are as simple as changing a single page. For example, changing
the sidebar to reﬂect the end of a long event is as simple as up-
dating “sidebr.frg”. To change the look-and-feel of a site, an ed-
itor only needs to change “header.frg” and “footer.frg”. For both
these kinds of changes, the system will use the ODG from Fig-
ure 8 to determine that Stefﬁ Graf’s page must be rebuilt (along
with many others). Once all pages have been rebuilt, they will
be republished. The user will see the changes on every page, al-
though the vast majority of underlying fragments will not have
More static information, like player biographies, can be kept
up-to-date in one place but used on many pages. For exam-
ple, “graf bio.frg” is used on our example page, but may also be
used in many other places. To include a new photo or update
the information included in the biography, the editors need only
concern themselves with updating “graf bio.frg”. The system
ensures that all pages which include “graf bio.frg” will auto-
matically be rebuilt.
Since scoring information will change frequently once a ten-
nis match is in progress, updating that aspect of a page can
be handled by an automated process. As a match begins,
“graf score.frg” is updated to include the match in progress.
This means that once the ﬁnal has begun, the “graf score.frg”
page will consist of HTML similar to
<!-- %fragment(final.frg) -->
<!-- %fragment(semi.frg) -->
When the updated “graf score.frg” is published, the system
will detect that it now includes “ﬁnal.frg” and “semi.frg” and
will update the ODG as shown in the dashed box within Fig-
ure 8. Now, as the ﬁnal match progresses, only “ﬁnal.frg” needs
to be updated and published through our system. As part of the
publication process, the system will detect that “ﬁnal.frg” is in-
cluded in “graf score.frg”, causing “graf score.frg” to be rebuilt
using the updated score. Likewise, the system will detect that
Stefﬁ Graf’s page must be rebuilt as well, and a new page will
be built including the updated scoring information. Eventually,
when the match completes, the complete page shown in the ex-
ample is produced.
The score for the ﬁnal match will be displayed on many
pages other than Stefﬁ Graf’s player page. For instance, Martina
Hingis’s player page will also include these results, as will the
scoreboard page while the match is in progress. A page listing
matchups between different players will also contain the score.
To update all of these pages, the automated system only updates
one fragment. This keeps the automated system independent of
the site design.
III. DEPLOYMENT EXPERIENCES
One of the key things our publishing system enables is sep-
aration of the creative process from the mechanical process of
building a Web site. Previously, the content, look, and feel of
large sites we were involved with had to be carefully planned
ODG representation of this
Fig. 8. Sample screen shot from the ofﬁcial Web site for the French Open Tennis Tournament.
well in advance of the creation of the ﬁrst page. Changes to the
original plans were quite difﬁcult to execute, even in the best
of circumstances. Last-minute changes tended to be impossible,
resulting in a choice between delayed or ﬂawed site publication.
With our publishing system, the entire look and feel of a site
can be changed and republishedwithin minutes. Aside from the
cost savings, this has allowed tremendous creativity on the part
of designers. Entire site designs can be created, experimented
with, changed, discarded, and replaced several times a day dur-
ing the construction of the site. This can take place in parallel
with and independently of the creation of site content.
A speciﬁc example of this was demonstrated just before a
new site look for the 2000 Sydney Olympic Games Web site
(http://www.olympics.com) was made public. One day before
the site was to go live before the public, it was decided that the
search facility was not working sufﬁciently well and must be
removed. This change affected thousands of pages, and would
previously have delayed publication of the site by as much as
several days. Using our system, the site authors simply removed
the search button from the appropriatefragment and republished
the fragment. Ten minuteslater, the change was complete, every
page had been rebuilt, and the site went live on schedule.
Figures 9-12 characterize the objects and ODG’s at the 2000
Olympic Games Web site in early November of 1999. Recall
that an object is either a page or a fragment. Figure 9 shows
the distributionof object sizes. Figure 10 shows the distribution
of the number of incoming edges for ODG nodes. Figure 11
showsthe distributionof the numberof outgoing edges for ODG
nodes. Finally, Figure 12 shows the distribution of maximum
levels at which objects are recursively embedded. The embed
depth of an object is the maximum length of any path in the
ODG originating from the object.
The number of objects at the Web site will increase as the
start date for the 2000 Olympic Games approaches. Once the
Olympic Games are in full swing, the number of objects at the
site will likely exceed the number correspondingto Figures9-12
by a factor of more than ten.
IV. SYSTEM PERFORMANCE
This section describes the performance of a Java implementa-
tion of our system running on an IBM Intellistation containing
a 333 Mhz Pentium II processor with 256 Mbytes of memory
Distribution of Object Size
Object Size (in bytes)
Number of Objects (logarithmic scale)
Fig. 9. The distribution of object sizes at the 2000 Olympic Games Web site.
Each barrepresents the number ofobjects containedin thesize range whose
upper limit is shown on the X-axis.
Distribution of Incoming Edges
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 23
Number of Incoming Edges
Number of Objects (logarithmic scale)
Fig. 10. The distribution of the number of incoming edges for nodes of ODG’s
at the 2000Olympic Games Web site.
and the Windows NT (version 4.0) operating system. The dis-
tribution of Web pages sizes is similar to the one for the 1998
Olympic Games Web site  as well as more recent Web sites
deploying our system; the average Web page size is around 10
Kbytes. Fragment sizes are typically several hundred bytes but
usually less than 1 Kbyte. The distribution of fragment sizes is
also representative of real Web sites deploying our system.
Figure 13 shows the CPU time in milliseconds required for
constructing and publishing bundles of various sizes. Times are
averaged over 100 runs. All 100 runs were submitted simulta-
neously, so the times in the ﬁgure reﬂect the ability for the runs
to be executed in parallel. The solid curve depicts times when
all objects which need to be constructed are explicitlytriggered.
The dotted line depicts times when a single fragment which is
included in multiplepages is triggered; the pages which need to
be built as a result of the change to the fragment are determined
Distribution of Outgoing Edges
Number of Outgoing Edges
Number of Objects (logarithmic scale)
Fig. 11. The distribution of the numberof outgoing edges for nodes of ODG’s
at the 2000Olympic Games Web site.
Distribution of Embed Depth
Number of Objects (logarithmic scale)
Fig. 12. The distribution of the degree to which objects are embedded at the
2000Olympic GamesWeb site.
from the ODG. Graph traversal algorithms applied to the ODG
have relatively low overhead. By contrast, each object which is
triggered has to be read from disk and parsed; these operations
consume considerable CPU overhead. As the graph indicates,
it is more desirable to trigger a few objects, which are included
in multiple pages, than to trigger all objects which need to be
Our implementation allows multiple complex objects to be
constructed in parallel. As a result, we are able to achieve near
100% CPU utilization,even when construction of an object was
blocked due to I/O, by concurrently constructing other objects.
The breakdown as to where CPU time is consumed is shown
in Figure 14. CPU time is dividedinto the followingcategories:
Retrieve, parse: time to read all triggered objects from disk
and parse them for determining included fragments.
ODG update: time for updating the ODG based on the in-
formation obtained from parsing objects and for analyzing the
0 10 20 30 40 50 60 70 80 90
Pages in Bundle
all obj. triggered
one obj. triggered
Fig. 13. The CPU time in millisecondsrequiredtoconstructand publishbundles
of various sizes.
ODG to determine all objects which need to be updated and an
efﬁcient order for updating the objects.
Assembly: time to update all objects.
Save data: time to save all updated objects on disk.
Send ack: time to send an acknowledgment message via
HTTP that publication is complete.
CPU Breakdown, absolute times
Retrieve, parse odg update assembly save data send ack
1 to 100
100 to 100
Fig. 14. The breakdowninCPU time requiredto constructand publish a typical
In the bars marked 1 to 100, one fragment included in 100
others was triggered. The 100 pages which needed to be con-
structed were determined from theODG. In the bars marked100
to 100, the 100 pages which needed to be constructed were all
triggered. The times shown in Figure 14 are the average times
for a single page. The total average time for constructing and
publishinga pageinthe 1 to100 pageis25.86milliseconds(rep-
resented by theaggregate of allbars); thecorrespondingtimefor
the 100 to 100 case is 44.51 milliseconds.
The retrieve and parse time is signiﬁcantly higher for the 100
to 100 case because the system is reading and parsing 100 ob-
jects compared with 1 in the 1 to 100 case. Since the source for
every object that is triggered must be saved, the time it takes to
save the data is somewhat longerwhen 100 objects are triggered
than when only one object is triggered.
Figure 15 shows how the average construction and publica-
tion time varies with the number of embedded fragments within
a Web page. Figure 16 shows how the average construction and
publicationtime varies with the number of fragments which are
triggered for a Web page containing 20 fragments. Both graphs
are averaged over 100 runs.
0 10 20 30 40 50 60 70 80 90
Fig. 15. The averageCPU time inmillisecondsrequiredtoconstructand publish
a complex Web page as a function of the number of embedded fragments.
In each case,one fragment in the page was triggered.
0 5 10 15 20
Fig. 16. The averageCPU time inmillisecondsrequiredtoconstructand publish
a complexWeb page as a function of the number of fragments triggered.
V. RELATED WORK
There are a number of Web content management tools on
the marketplace today such as NetObjects Fusion , Allaire’s
ColdFusion and Homesite , FutureTense’s Internet Publish-
ing System , Eventus Software’s Control , Wallop Soft-
ware’s Build-It (now owned by IBM) , Site Technologies’
SiteMaster , and Microsoft’s Visual InterDev .
As far as we know, none of these products allow nested frag-
ments to the degree which we do. Most of them don’t allow
any type of embedded fragments. They are also not designed to
publish content in multiple stages as oursis.
A key problem with many products such as Fusion and
SiteMaster is that they only work well when all of the Web
content is designed using the product. They don’t provide rich
programmatic interfaces which can deal with or import content
from external sources or feeds. These products thus lack the
ability to treat external data with the same level of control and
consistency as the sources of data the application owns.
By contrast, our system allows Web pages to come from mul-
tiple external sources. This is a key requirement for many of the
Web sites we have encountered. Build-Itis similarto our system
inthat itworkswithWebcontent createdby othersources. How-
ever, we found that Build-It was not able to handle Web sites as
large as the 1998 Olympic Games Web site, for example. Our
system is scalable to handle extremely large Web sites.
Our system uses many ideas from the system used for the
1998OlympicGames Web site, . That system usedan ear-
lier versionof the TriggerMonitortomaintain updatedcaches of
dynamic data. The originalTrigger Monitormaintainedupdated
caches by reacting to database triggers. When a database change
occurred, a database triggerinvoked a UDF(User Deﬁned Func-
tion) that sent a message to the Trigger Monitor containing an
encoded summary of the change. The Trigger Monitor decoded
the message, consulted an ODG to determine which pages were
affected, requested pages from a non-caching HTTP server, and
ﬁnally replaced the updated pages in the caches of the servers
connected to the Web.
While the 1998 Olympic Games system worked extremely
well for maintaining updated caches, it lacked the automated
features of our new system for automatically and consistently
publishing dynamic content. While the earlier system used ob-
ject dependence graphs for determining how changes to under-
lying data affected cached objects, it didn’t have capabilities for
automatically constructing pages and fragments in an optimal
order. The earlier system also couldn’t publish combined con-
tent pages efﬁciently and had fewer options for bundling Web
pages for consistent publication.
VI. SUMMARY AND CONCLUSIONS
We have presented a publishingsystem forefﬁciently creating
dynamic Web content. Our publishing system constructs com-
plex objects from fragments which may recursively embed other
fragments. Relationships between Web pages and fragments are
represented by object dependence graphs. We presented algo-
rithms for efﬁciently detecting and updating all affected Web
pages after one or more fragments change.
After a set of multiple Web pages change or are created for
the ﬁrst time, the Web pages must be published to an audience.
Publishing all changed Web pages in a single atomic action
avoids consistency problems but may cause delays in publica-
tion, particularly if the newly constructed pages must be proof-
read before publication. Incremental publication can provide
information faster but may also result in inconsistencies across
published Web pages. We presented three algorithms for in-
cremental publication designed to handle different consistency
Our publishing system provides an easy method for Web site
designers to specify and modify inclusion relationships among
Web pages and fragments. Users can updatecontent on multiple
Web pages by modifying a template. The system then automat-
ically updates all Web pages affected by the change. It is easy
to change the look and feel of an entire Web site as well as to
consistently update common information on many Web pages.
Our system accommodates both quality controlled fragments
that must be proofread before publicationand are typically from
humans as well as immediate fragments that have to be pub-
lished immediately and are typically from automated feeds. A
Web page can combine both quality controlled and immediate
fragments and still be updated in a timely fashion.
Our publishing system has been implemented in both Java
and C++ and is being deployed at several popular Web sites in-
cluding the 2000 Olympic Games Web site. We discussed some
of our experiences with real deployments of our system as well
as its performance.
Several people have contributed to this work including Paul
Dantzig, Peter Davis, Daniel Dias, Glenn Druce, Sara Elo, Grant
Emery, Peter Fiorese, Kip Hansen, Brenden O’Sullivan, Kent
Rankin, and Jerry Spivak.
 Allaire’s ColdFusion and HomeSite. http://www.allaire.com/.
 Jim Challenger, Paul Dantzig, and Arun Iyengar. A Scalable and Highly
Available System for Serving Dynamic Data at Frequently Accessed Web
Sites. In Proceedingsof ACM/IEEE SC98, November1998.
 Jim Challenger, Arun Iyengar, and Paul Dantzig. A Scalable System for
Consistently Caching Dynamic Web Data. In Proceedings of IEEE INFO-
COM ’99, March 1999.
 T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms,MIT
 FutureTense’s InternetPublishing System. http://www.futuretense.com/.
 EventusSoftware’s Control. http://www.eventus.com/.
 Arun Iyengar and Jim Challenger. Improving Web Server Performance by
Caching Dynamic Data. In Proceedings of the 1997 USENIX Symposium
on Internet Technologiesand Systems, December 1997.
 Arun Iyengar, Mark Squillante, and Li Zhang. Analysis and Characteriza-
tion of Large-scale Web Server Access Patterns and Performance. In World
Wide Web, June 1999.
 Eric Levy-Abegnoli,Arun Iyengar,JunehwaSong and DanielDias. Design
and Performance of a Web Server Accelerator. In Proceedings of IEEE
INFOCOM ’99, March 1999.
 Microsoft’s Visual InterDev. http://www.microsoft.com/.
 NetObjects Fusion. http://www.netobjects.com/.
 Site Technologies’SiteMaster. http://www.sitetech.com/.
 Wallop Software’s Build-It. http://www.wallop.com/.