September 2024
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
September 2024
·
3 Reads
June 2023
·
105 Reads
·
1 Citation
Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the hash values generated on the same resource are identical, then the fixity of the resource is verified. We tested this process by conducting a study on 16,627 mementos from 17 public web archives. We replayed and downloaded the mementos 39 times using a headless browser over a period of 442 days and generated a hash for each memento after each download, resulting in 39 hashes per memento. The hash is calculated by including not only the content of the base HTML of a memento but also all embedded resources, such as images and style sheets. We expected to always observe the same hash for a memento regardless of the number of downloads. However, our results indicate that 88.45% of mementos produce more than one unique hash value, and about 16% (or one in six) of those mementos always produce different hash values. We identify and quantify the types of changes that cause the same memento to produce different hashes. These results point to the need for defining an archive-aware hashing function, as conventional hashing functions are not suitable for replayed archived web pages.
September 2022
·
16 Reads
·
2 Citations
Lecture Notes in Computer Science
Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the “Event Notifications in Value-Adding Networks” specification, details interoperability requirements for the exchange real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions (registration, certification, awareness and archiving) in a decentralized and decoupled way.KeywordsScholarly communicationDigital librariesOpen science
August 2022
·
73 Reads
Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the "Event Notifications in Value-Adding Networks" specification, details interoperability requirements for the exchange real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions (registration, certification, awareness and archiving) in a decentralized and decoupled way.
June 2022
·
11 Reads
·
2 Citations
July 2021
·
33 Reads
·
8 Citations
The Internet Archive pioneered web archiving and remains the largest publicly accessible web archive hosting archived copies of web pages (Mementos) going back as far as early 1996. Its holdings have grown steadily since, and it hosts more than 881 billion URIs as of September 2019. However, the landscape of web archiving has changed significantly over the last two decades. Today we can freely access Mementos from more than 20 web archives around the world, operated by for-profit and nonprofit organisations, national libraries and academic institutions, as well as individuals. The resulting diversity improves the odds of the survival of archived records but also requires technical standards to ensure interoperability between archival systems. To date, the Memento Protocol and the WARC file format are the main enablers of interoperability between web archives. We describe a variety of tools and services that leverage the broad adoption of the Memento Protocol and discuss a selection of research efforts that would likely not have been possible without these interoperability standards. In addition, we outline examples of technical specifications that build on the ability of machines to access resource versions on the Web in an automatic, standardised and interoperable manner.
August 2020
·
633 Reads
In July, 1995 the first issue of D-Lib Magazine was published as an on-line, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as the increasing availability of and competition from eprints, institutional repositories, conferences, social media, and online journals -- the very ecosystem that D-Lib Magazine nurtured and enabled. As long-time members of the digital library community and authors with the most contributions to D-Lib Magazine, we reflect on the history of the digital library community and D-Lib Magazine, taking its very first issue as guidance. It contained three articles, which described: the Dublin Core Metadata Element Set, a project status report from the NSF/DARPA/NASA-funded Digital Library Initiative (DLI), and a summary of the Kahn-Wilensky Framework (KWF) which gave us, among other things, Digital Object Identifiers (DOIs). These technologies, as well as many more described in D-Lib Magazine through its 23 years, have had a profound and continuing impact on the digital library and general web communities.
September 2019
·
20 Reads
Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.
August 2019
·
12 Reads
·
12 Citations
Lecture Notes in Computer Science
Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human-driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.
July 2019
·
11 Reads
Digital repositories can often easily be navigated by humans but not by machines. We introduce Signposting, a mechanism to show machines how to maneuver repositories’ objects and how to interpret their relationships. Signposting is based on standard and widely adopted web technologies - typed links and HTTP link headers.
... Our requirements assume that archive services can create authentic mementos of Event Logs so that fixity information can be verified. Aturban et al. [23] show that, in general cases, current web archives, such as the Internet Archive, routinely fail to offer authentic mementos to external applications when replaying archived web pages. The mementos that are presented to typical users of web archives are often not the raw data that was archived but a processed version that presents the archive's best effort to create human-interpretable past versions of the web. ...
June 2023
... In dat toekomstige scenario wordt het overzicht van onderzoeksoutput in Biblio meer automatisch up-to-date gehouden, wat de administratieve last doet afnemen en de garantie op volledigheid laat toenemen. Om dat mogelijk te maken, werken we met de vernieuwde Biblio aan het fundament voor een meer gedecentraliseerde infrastructuur waarbij gegevens en dataverrijkingen automatisch over en weer kunnen stromen, onder andere met behulp van notification-gebaseerde protocollen voor institutionele repositories (Hochstenbach et al., 2022). Deze protocollen maken het mogelijk om op een geautomatiseerde manier te weten te komen welke publicaties en datasets van een auteur wereldwijd gekend zijn met als doel data over publicaties op een betrouwbare manier te aggregeren. ...
September 2022
Lecture Notes in Computer Science
... The openness of a standard, while critical, is arguably not sufficient for widespread adoption. Achieving successful interoperability often depends on having a significant platform, either through commercial influence or through community involvement as articulated by Nelson and Van de Sompel (2022): ...
June 2022
... The concept of aggregation goes beyond the Memento specification by leveraging a similar structure to TimeMaps but allowing the URIs contained within the aggregated TimeMap to identify resources at multiple archives instead of a single archive. The Research Library at Los Alamos National Laboratory (LANL) deployed the original Memento aggregator [9,18], currently accessible through a web interface via the Time Travel service at https://timetravel. mementoweb.org/. ...
July 2021
... The preservation of digital ads requires not just storing files but also ensuring they can be replayed in future browsers, which may not support older media formats or web standards. Klein et al. (2019) highlight the increasing difficulty in maintaining archival quality due to the proliferation of dynamic web content, particularly content that is only accessible through the activation of JavaScript-based features. [8] Online ads often depend on external scripts, tracking pixels, and third-party services to function correctly. ...
Reference:
Archiving Digital Marketing
August 2019
Lecture Notes in Computer Science
... For example, investigations by Klein and Balakireva (2022) of DOIs -arguably the most ubiquitous PID type --suggests widespread DOI request failures and inconsistent machine responses from organizations using them. Members of the same research team have also proposed their Memento 'Robust Links' approach as a means of improving the reliability of URL and URI-based referencing on the web, including with respect to PIDs (Klein et al., 2018). PIDs are therefore only persistent insofar as a PID registration service commits to resolving them, or insofar as a publisher commits to updating a PID registry with the current location of a web resource. ...
May 2018
... Klein et al. [28] (2018) undertake a study to determine whether or not it is possible to carry out targeted crawls on the archive web. They are able to efficiently buy 22 web archives that contribute to the overall creation of event collections by using the Memento architecture. ...
May 2018
... ey proposed a new way to publish Linked Data. ey also demonstrated this by combining queries with other relational data sources to archive the DBpedia version [2]. ...
October 2017
Journal of Documentation
... Most research involving Memento aggregation relates to usage of the aggregator rather than enhancement of the aggregation process. In the same way that prior to MemGator, researchers would state "we requested URIs from the Time Travel Service", this statement was transformed to "we used MemGator to request URIs", indicative that it was useful for researchers to utilize their own aggregator instance [4,14,21]. A facet of this use case is the ability for researchers to customize the set of web archives to be used as the basis for querying, which is performed prior to running MemGator by modifying a configuration file. ...
June 2017
... Biological and Bio Medical sciences subject's lead with 35% records in the ORCID system. ORCID ID is supportive as it maintains the machine-readable Researchers profiles (Klein & Van de Sompel, 2017) and unique identification. However, the ORCID ID has a low number of profiles in comparison with the other Academic Scholarly Networking Site i.e. ...
June 2017