Conference PaperPDF Available

Functional Requirements for Information Resource Provenance on the Web

Authors:

Abstract and Figures

HTTP transactions have semantics that can be interpreted in many ways. At a low level, a physical stream of bits is transmitted from server to client. Higher up, those bits resolve into a message with a specific bit pattern. More abstractly, information, regardless of the physical representation, has been transferred. While the mechanisms associated with these abstractions, such as content negotiation, are well established, the semantics behind these abstractions are not. We extend the library science resource model Functional Requirements for Bibliographic Resources (FRBR) with cryptographic message and content digests to create a Functional Requirements for Information Resources (FRIR) ontology that is integrated with the W3C Provenance Ontology (PROV-O) to model HTTP transactions in a way that clarifies the many relationships between a given URL and all representations received from its request. Use of this model provides fine-grained provenance explanations that are complementary to existing explanations of web resources. Furthermore, we provide a formal explanation of the relationship between HTTP URLs and their representations that conforms with the existing World Wide Web architecture. This establishes the semiotic relationships between different information abstractions, their symbols, and the things they represent.
Content may be subject to copyright.
A preview of the PDF is not available
... In other way we can consider the fingerprints and message authentication. McCuskeret.al [6] in their research publication proposed about "Functional requirements for information re-source provenance on the web," and discussed how many different ways the semantics of HTTP transactions can be interpreted. The mechanisms related to these abstractions such as content negotiations are well established but not the semantics behind these abstractions.Informational resources are understanding critical requests in the URI's. ...
... Similar methods to the ones presented in this paper, i.e. calculating hash values in a format-independent manner, have been proposed to track the provenance of data sets [13]. This has been used to define a conceptualization of multi-level identities for digital works based on cryptographic digests and formal semantics, covering different conceptual levels from single HTTP transactions to high-level content identifiers [14]. ...
Article
Full-text available
The current Web has no general mechanisms to make digital artifacts — such as datasets, code, texts, and images — verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts, in a manner that is independent of the serialization format in the case of structured data files such as nanopublications. We demonstrate how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.
... Similar methods to the ones presented in this paper, i.e. calculating hash values in a format-independent manner, have been proposed to track the provenance of data sets [14]. This has been used to define a conceptualization of multi-level identities for digital works based on cryptographic digests and formal semantics, covering different conceptual levels from single HTTP transactions to high-level content identifiers [15]. ...
Conference Paper
Full-text available
To make digital resources on the web verifiable, immutable, and permanent, we propose a technique to include cryptographic hash values in URIs. We call them trusty URIs and we show how they can be used for approaches like nanopublications to make not only specific resources but their entire reference trees verifiable. Digital artifacts can be identified not only on the byte level but on more abstract levels such as RDF graphs, which means that resources keep their hash values even when presented in a different format. Our approach sticks to the core principles of the web, namely openness and decentralized architecture, is fully compatible with existing standards and protocols, and can therefore be used right away. Evaluation of our reference implementations shows that these desired properties are indeed accomplished by our approach, and that it remains practical even for very large files.
Article
Main vision behind semantic web data maintenance is to make the text or content of web is interpretable, among different things, and sophisticated search procedures over huge amount linked web data. In this case different human beings perform fraudulent text which can be found on internet web. So that main research concentrated on automated approaches autonomously identified and analyzed content present in web. Making digital artifacts on web in terms of text, code and others verifiable and reliable, because of general nature of digital artifacts i.e immutability, it is a serious concept to produce reproduces the updated results of processes that display modified content in web resources. So that in this paper, we propose De-centralized Hash URI approach (DHURI) to solve the immutable concept of digital artifact generation in nano-publication with unique hash based cryptography value. In this approach, we provide Decentralized secure URI for identification of digital artifacts with parallel structure representation of data such as nano-publication. Basic concept of this approach is to handle digital artifact immutable concept with evaluation of existing approaches, and also describe the performance of proposed approach in de-centralized framework.
Article
Full-text available
The current net has no general mechanisms to create digital artifacts — like datasets, code, texts, and pictures, verifiable and permanent. For digital artifacts that are not modifiable to be immutable , there's what is more no normally accepted technique to enforce this unchangingness. These shortcomings have a heavy negative impact on the flexibility to breed the results of processes that rely on Web resources, that successively heavily impacts areas like science wherever duplicability is vital. To solve this downside, we tend to propose authentic URIs containing cryptographical hash values. We show however honest URIs will be used for the verification of digital artifacts, in a manner that's freelance of the publishing format within the case of structured information files like nanopublications. We demonstrate how the contents of those files become changeless, as well as dependencies to external digital artifacts and thereby extending the range of verifiability to the whole reference tree. Our approach sticks to the core principles of the net, specifically openness and decentralized architecture, and is totally compatible with existing standards and protocols. Evaluation of our reference implementations shows that these style goals area unit so accomplished by our approach, which it remains sensible even for terribly giant files.
Article
Full-text available
One current challenge in linked science is to adequately de-scribe where a piece of information in the linked science cloud came from. Provenance models, such as Proof Markup Language (PML), have developed methods for expressing simple relationships between informa-tion and the sources of information. We argue that the representation of where information comes from is central to trusting linked data in sci-entific applications. We introduce the notion of a model of information source and the usage of the source to obtain information by describing the Proof Markup Languages notion of source usage and show how this relationship can be modeled in a library science schema, Functional Re-quirements for Bibliographic Resources (FRBR). We discuss how these kinds of representations are critical to provenance models.
Article
The formal charge for the IFLA study involving international bibliography standards was to delineate the functions that are performed by the bibliographic record with respect to various media, applications, and user needs. The method used was the entity relationship analysis technique. Three groups of entities that are the key objects of interest to users of bibliographic records were defined. The primary group contains four entities: work, expression, manifestation, and item. The second group includes entities responsible for the intellectual or artistic content, production, or ownership of entities in the first group. The third group includes entities that represent concepts, objects, events, and places. In the study we identified the attributes associated with each entity and the relationships that are most important to users. The attributes and relationships were mapped to the functional requirements for bibliographic records that were defined in terms of four user tasks: to find, identify, select, and obtain. Basic requirements for national bibliographic records were recommended based on the entity analysis. The recommendations of the study are compared with two standards, AACR and the Dublin Core, to place them into pragmatic context. The results of the study are being used in the review of the complete set of ISBDs as the initial benchmark in determining data elements for each format.