Conference Paper

Zoetrope: Interacting with the Ephemeral Web

DOI: 10.1145/1449715.1449756 Conference: Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, Monterey, CA, USA, October 19-22, 2008
Source: DBLP


The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historical Web (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations. ACM Classification: H5.2 (Information interfaces and presentation): User Interfaces. - Graphical user interfaces.

Download full-text


Available from: Mira Dontcheva, Dec 14, 2014
  • Source
    • "The poor integration of archival content in regular Web navigation is also a fundamental hindrance to applications that require finding, analyzing, extracting, comparing, and otherwise leveraging historical Web information. Examples include Zoetrope, a tool that allows interaction with and visualization of high-resolution temporal Web data [1]; DiffIE, a Web browser plug-in that emphasizes Web content that changed since a previous visit [32]; and time-oriented search that tracks the frequency of words and phrases in resources over time [20]. These applications must build their own special-purpose archives in an ad-hoc manner in order to achieve their goals. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol-wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a significant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web. Comment: 14 pages, 5 figures
    Full-text · Article · Nov 2009
  • Source
    • "Retaining such a history of web pages visited by the user for later-time analysis has attracted some attention. For example, Zoetrope [3] implements a variety of browser-side plugins that enable proper capturing and replaying multiple versions of complex web pages – even those which include dynamic content, cookies, and advertisements etc. In EverLast we currently utilize a proxy server that has been enhanced with archiving abilities to achieve reasonable captures. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The World Wide Web has become a key source of knowl- edge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering eorts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving eorts have paid sig- nicant attention towards long term preservation of Web data, they have paid little attention to developing an global- scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast , a scalable distributed frame- work for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely- coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival eorts taken mainly at a na- tional level by national digital libraries. Key features of EverLast include support of time-based text search & anal- ysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.
    Full-text · Conference Paper · Jan 2009
  • Source

    Full-text · Article ·
Show more