Conference Paper

Sharing Linked Open Data over Peer-to-Peer Distributed File Systems: The Case of IPFS

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Linked Open Data (LOD) is a method of publishing machine-readable open data so that it can be interlinked and become more useful through semantic querying. The decentralized nature of the current LOD cloud relies on location-specific services, which is known to result in problems of availability and broken links. Current approaches to peer-to-peer (P2P) decentralized file systems could be used to support better availability and performance and provide permanent data, while preserving LOD principles. Applications would also benefit from mechanisms that ensure that LOD entities are permanent and immutable, independently of their original publishers. This paper outlines a first prototype design of LOD over the Interplanetary File System (IPFS), a P2P system based on Merkel DAGs and a content-addressed block storage model. The fundamental ideas on that implementation are discussed and an example implementation on the early version of IPFS is described, showing the feasibility of such approach and its main differentiating features.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Indexing is not a capability directly supported by current public blockchains and thus requires some additional infrastructure, but is dependent on referencing the blockchain for tamper-proof provenance. The same occurs with referencing (linking) that takes a different form if referencing resources in decentralized systems [13], but it is not fully resolved with it and requires additional conventions. Pricing, rights management and interchange are not granted as a direct consequence of using a blockchain, but the blockchain enables the creation of new mechanisms for such applications. ...
... Decentralization entails the exposure of a heterogeneity of autonomous, incompatible media repositories and it is unlikely that there will ever exist a single agreed-upon metadata schema (if it ever exists it should be based on a system of incentives that is still to be conceived). This entails that the interpretation of the metadata also requires that the schemas and ontologies or terminologies used by them are also deployed in immutable decentralized systems, but this has been focus of previous research [13] and is not considered in further detail here. ...
... The implications is that terminologies can be considered another resource that must be subject to decentralized storage and attestation of authenticity via blockchain transactions. In [13] the distributed storage part is discussed, and the proof of authenticity could be achieved by similar means to that of the metadata, but in this case, just registering the different versions of the KOS. ...
Conference Paper
Full-text available
Metadata repositories and services support the key functions required by the curation of digital resources, including description, management and provenance. They typically use conventional databases owned and managed by different kinds of organizations that are trusted by their users. Blockchains have emerged as a means to deploy decentralized databases secured from tampering and revision, opening the doors for a new way of deploying that kind of digital archival systems. In this paper we review and evaluate the functions of metadata in that new light and propose an approach in which a blockchain combined with other related technologies can be arranged in a particular way to obtain a decentralized solution for metadata supporting key functions. We discuss how the approach overcomes some weaknesses of current digital archives, along with its important implications for the management and sustainability of digital archives.
... The interoperability of data maintained and shared with a distributed heterogeneous data process environment is essential. The sharing linked data to distributed stakeholders for scalability and availability presented by Sicilia et al. in [43]. ...
Article
Full-text available
The streamflow data acquisition with various techniques and dispersing of River’s locations demand improvement of reliability and frequency aspects. One of the reliability measurement characteristics is data ownership. The authority sharing data authorizes its quality and is also responsible for the wrong decision triggered by incorrect data. The consensus-based crowdsourcing data contribute to aggregated streamflow records’ generation. The aggregated streamflow records are stored on a streamflow ledger, a Hyperledger fabric-based ledger for rivers’ streamflow data. In contrast, InterPlanery File System (IPFS) distributed file storage system is helpful for policy documents and distribution scheme storage. Blockchain-based techniques for improvement of data-intensive decision support systems. The distributed river streamflow data measured and shared by distributed gauging officials and stored on the blockchain-based distributed storage system contributes to scalability, transparency, availability, and accessibility of shareable data. All stakeholders require quality data with trust and provenance management through a persistent, linked, and immutable copy of data. This technique resolves the conflict or disagreement on streamflow optimization and flood mitigation decisions. In a nutshell, the blockchain technology with IPFS for off-chain large files storage would contribute twofold to irrigation systems and flood mitigation domains. On one side, the streamflow data aggregation has consensus from streamflow assessment for software agents and stakeholders. Secondly, the persistent data copy is shared among distributed stakeholders using IPFS-based content addressed file-sharing protocol for a common operating picture to improve effective collaboration and coordination among managers of irrigation systems and flood mitigation activities.
... After the user gets the Hash index, he can access the system resources based on the index, the system will find the file location based on a distributed Hash table and return the file. The network of IPFS is distributed and not fixed with fine-grained network, which is perfectly compatible with the unique network requirements of content distribution [27]. Table 1 presents the distributed hash tables in the IPFS storage design. ...
Article
Full-text available
In order to solve the problem of medical data sharing difficulties among medical institutions, we propose a secure medical data sharing scheme based on traceable ring signature and blockchain. Firstly, a certificateless traceable ring signature algorithm based on distributed key generation is proposed to provide data integrity with privacy preservation. Secondly, the smart contract combined with access control and Self-Controlling Object (SCO) can realize the decryption outsourcing and data sharing. In addition, the proposed scheme uses the InterPlanetary File System (IPFS) to store the oceans of medical privacy data, and encrypts the hash index to store, which improves the efficiency of data sharing. Finally, integrated with the blockchain, we can select the proxy node and upload the SCO package to the blockchain node for data sharing by using consensus mechanism. The security analysis shows that the scheme can realize the electronic health record (EHR) source tracking while achieving secure data sharing and privacy protection. In the performance evaluation, we compare the functionality with other schemes and conclude that our scheme is well-functional. Also, the time-consuming simulation of the algorithms in our scheme using the PBC library reflects a high practicality. The results show that the proposed scheme is secure in terms of medical data sharing and privacy protection, and feasible for data source tracking.
... Linked Data and Semantic Web. Studies [10,47] proposed the publication of Linked Data on top of IPFS as an extension to the Semantic Web. Objects in IPFS are structured and linked in a DAG, which translates well to the graph structure of linked data, as the relationships between objects allow machines to read and semantically process data. ...
Preprint
Full-text available
Recent data ownership initiatives such as GAIA-X attempt to shift from currently common centralised cloud storage solutions to decentralised alternatives, which gives users more control over their data. The InterPlanetary File System (IPFS) is a storage architecture which attempts to provide decentralised cloud storage by building on founding principles of P2P networking and content addressing. It combines approaches from previous research, such as Kademlia-based Distributed Hash Tables (DHTs), git's versioning model with cryptographic hashing, Merkle Trees, and content-based addressing in order to compose a protocol stack that supports both forward and backward compatibility of components. IPFS is used by more than 250k peers per month and serves tens of millions of requests per day, which makes it an interesting large-scale operational network to study. In this editorial, we provide an overview of the IPFS design and its core features, along with the opportunities that it opens as well as the challenges that it faces because of its properties. IPFS provides persistence of names, censorship circumvention, built-in file deduplication with integrity verification, and file delivery to external users via an HTTP gateway, among other properties.
... Sicilia et al. [26] explore publishing datasets on IPFS, either by storing the whole graph as a single object or by storing each dereferenceable entity as an object. Furthermore they propose using IPNS to refer to the most recent dataset version. ...
Chapter
Full-text available
The growing web of data warrants better data management strategies. Data silos are single points of failure and they face availability problems which lead to broken links. Furthermore the dynamic nature of some datasets increases the need for a versioning scheme. In this work, we propose a novel architecture for a linked open data infrastructure, built on open decentralized technologies. IPFS is used for storage and retrieval of data, and the public Ethereum blockchain is used for naming, versioning and storing metadata of datasets. We furthermore exploit two mechanisms for maintaining a collection of relevant, high-quality datasets in a distributed manner in which participants are incentivized. The platform is shown to have a low barrier to entry and censorship-resistance. It benefits from the fault-tolerance of its underlying technologies. Furthermore, we validate the approach by implementing our solution.
... IPFS [23] is a distributed storage mode that cares about neither the location of the central server nor the file name and path, but what may appear in the file. After placing any file on IPFS, the encrypted hash value is calculated based on the content of the file. ...
Chapter
The increasing demand for digital copyright transactions in the big data era causes the emergence of piracy and infringement incidents. The traditional centralized digital copyright protection system based on centralized authority management of authoritative organizations has high registration cost. Blockchain, as a decentralized protocol, has the characteristics of decentralization, anonymity, auditability, security and persistency, which provide a solution to the current problems in the field of digital copyright. Combining with blockchain technology, this paper proposes a digital copyright registration and transaction system with double chain architecture. The blockchain based on digital copyright registration and management chain (RMC) and digital copyright transaction and subscription chain (TSC) can prevent information disclosure and improve the privacy protection by segregating account information and transaction information. Meanwhile, transforming the previous one-chain architecture into multiple chains in parallel on the RMC and TSC can reduce the redundant amount of computation, the consensus efficiency and improve the throughput rate. Experimental analysis shows that the system has the advantages of short registration time, high throughput and good scalability.
... The full and lightweight blockchain function could be deployed on different physical nodes [8][9][10]. Besides, the InterPlanetary File System (IPFS) could be deployed as the big file storage of the blockchain [11][12][13][14]. ...
Conference Paper
Full-text available
Smart city construction brings great development opportunities for the video surveillance industry, but it also brings significant challenges. With the development of Internet of things and artificial intelligence technology, video computing across different trust domains has become important and necessary for ensuring public safety and promoting smart city construction. With the continuous increasing of video front-end equipment and massive video data, trust and security problems across different domains, network bandwidth and computing resources are gradually becoming bottlenecks of video computing. We propose a solution based on integration of blockchain and Internet of things which brings together and distributes video data and computing resources to enhance the edge video computing capability. Blockchain provides a distributed solution for trust video computing across different trust domains. We evaluate the performance of blockchain-based video computing solution across different trust domains and the results show that this solution could provide trust video computing across trust domains feasibly and efficiently. This solution implies a new trigger of trust distributed applications and innovative trust economic ecosystem.
... Sicilia et al. [51] explore publishing datasets on IPFS, either by storing the whole graph as a single object or by storing each dereferenceable entity as an object. Furthermore they propose using IPNS to refer to the most recent version of a dataset. ...
Thesis
Full-text available
The growing web of data warrants better data management strategies. Data silos are single points of failure and they face availability problems which lead to broken links. Furthermore the dynamic nature of some datasets increases the need for a versioning scheme. In this work, we propose a novel architecture for a linked open data infrastructure, built on open decentralized technologies. IPFS, a P2P globally available filesystem, is used for storage and retrieval of data, and the public Ethereum blockchain is used for naming, versioning and storing meta-data of datasets. Triples are indexed via a Hexastore, and Triple Pattern Fragments framework is used for retrieval of data. We furthermore explore two mechanisms for maintaining a collection of relevant, high-quality datasets in a distributed manner in which participants are incentivized. The platform is shown to have a low barrier to entry and censorship-resistance. It benefits from the fault-tolerance of its underlying technologies and in most cases is expected to offer higher availability. An analysis in terms of the FAIR principles, showing improved findability, interoperability and accessibility for datasets published on the infrastructure, is further provided.
Article
Full-text available
The recent and significant change in the architecture, engineering, and construction (AEC) industry is the increasing use of building information management (BIM) tools in design and how the generated digital data are consumed. Although designers have primarily published blueprints and model files, multiple parties are now involved in producing different aspects of the data and may be consuming data produced by others. This evolution has created new challenges in sharing and synchronizing information across organizational boundaries. Linked building data is a community effort identified as a potential means of overcoming the barriers to data interoperability. In this paper, we present how the industry can be strengthened using a peer-to-peer network based on the InterPlanetary File System (IPFS) framework to address the typical availability problems of web-based data. The paper particularly presents how Linked Data serialization of the Industry Foundation Classes (IFC) standard can be used in the context of IPFS and benchmarks the performance for the publication of models between IPFS versus HTTP protocol. We show that linked building data—in particular, IFC models converted into Resource Description Format (RDF) graphs according to the ifcOWL ontology—can be implemented using the framework, with initial indications of significant benefits of a peer-to-peer network in terms of performance, offline access, and immutable version history. Two use cases in distributed collaborative environments in the AEC/facility management (FM) sector using evolving multidomain models are used to evaluate the work.
Article
Full-text available
The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.
Conference Paper
Full-text available
This paper addresses the development of trust in the use of Open Data through incorporation of appropriate authentication and integrity parameters for use by end user Open Data application developers in an architecture for trustworthy Open Data Services. The advantages of this architecture scheme is that it is far more scalable, not another certificate-based hierarchy that has problems with certificate revocation management. With the use of a Public File, if the key is compromised; it is a simple matter of the single responsible entity replacing the key pair with a new one and re-performing the data file signing process. Under this proposed architecture, the Open Data environment does not interfere with the internal security schemes that might be employed by the entity. However, this architecture incorporates, when needed, parameters from the entity, e.g. person who authorized publishing as Open Data, at the time that datasets are created/added. © IFIP International Federation for Information Processing 2015.
Conference Paper
Full-text available
In discussions of open government data (hereafter simply open data or OGD) the question of how such data should be licensed or whether they need to be licensed at all has to date received only limited attention – at least in the academic literature. A common assumption, at least in the public sphere, is that a large fraction of the data collected by governments can and should be released free of any constraints or restrictions for all to access and do with as they will. However, even for data that do not fall within the ambit of the security of the state it is far from obvious that this must be so; different forms of formal licensing may be appropriate and necessary in many cases. A libertarian approach to OGD is just one of a number of licensing options. A common assumption, at least in the public sphere, is that a large proportion of the data collected and held by governments can and should be released free of any constraints or restrictions for all citizens, communities and organizations to access and use as they wish. However, even for data that does not fall within the ambit of personal privacy, the security of the state or is otherwise sensitive, it is far from obvious that this should be so; different forms of formal licensing may be appropriate in some cases and necessary in others. A libertarian, free-for-all approach to open government data is just one of a number of licensing options from which governments can choose.
Article
Full-text available
data.europeana.eu is an ongoing effort of making Europeana metadata available as Linked Open Data on the Web. It allows others to access metadata collected from Europeana data providers via standard Web technologies. The data are represented in the Europeana Data Model (EDM) and the described resources are addressable and dereferencable by their URIs. Links between Europeana resources and other resources in the Linked Data Web will enable the discovery of semantically related resources. We developed an approach that allows Europeana data providers to opt for their data to become Linked Data and converts their metadata to EDM, benefiting from Europeana efforts to link them to semantically related resources on the Web. With that approach, we produced a first Linked Data version of Europeana and published the resulting datasets on the Web. We also gained experiences with respect to EDM, HTTP URI design, and RDF store performance and report them in this paper.
Conference Paper
Full-text available
Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors through- out its history. The PROV set of speci cations, produced by the World Wide Web Consortium (W3C), is designed to pro- mote the publication of provenance information on the Web, and o ers a basis for interoperability across diverse prove- nance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling speci c domains. This tutorial provides an account of these speci cations. Starting from intuitive and informal exam- ples that present idiomatic provenance patterns, it progres- sively introduces the relational model of provenance along with the constraints model for validation of provenance doc- uments, and concludes with example applications that show the extension points in use.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Article
Full-text available
Linked open data allow interlinking and integrating any kind of data on the web. Links between various data sources play a key role insofar as they allow software applications (e.g., browsers, search engines) to operate over the aggregated data space as if it was a unique local database. In this new data space, where DBpedia, a data set including structured information from Wikipedia, seems to be the central hub, we analyzed and highlighted outgoing links from this hub in an effort to discover broken links. The paper reports on an experiment to examine the causes of broken links and proposes some treatments for solving this problem.
Article
Full-text available
In this paper we discuss the design and implementation of voiD, the "Vocabulary Of Interlinked Datasets", a vocabu-lary that allows to formally describe linked RDF datasets. We report on use cases for voiD, the current state of the specification and its potential applications in the context of linked datasets.
Article
Full-text available
The term "Linked Data" refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Conference Paper
Full-text available
While Linked Open Data (LOD) has gained much attention in the recent years, requirements and the challenges concerning its usage from a database perspective are lacking. We argue that such a perspective is crucial for increasing acceptance of LOD. In this paper, we compare the characteristics and constraints of relational databases with LOD, trying to understand the latter as a Web-scale database. We propose LOD-specific requirements beyond the established database rules and highlight research challenges, aiming to combine future efforts of the database research community and the Linked Data research community in this area.
Article
Full-text available
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net.
Article
This paper starts from the warning given by Tim Berners-Lee about the present threats to the social web with a view to analyzing the main limitations of the Web 2.0 paradigm (fragmentation, centralization, control and risks to privacy). The authors continue on to describe several proposals (federation and interoperability, distribution and free management of identity and privacy) that tackle those threats. Finally, the paper offers a brief comparative map of decentralized social web efforts and focuses on the specific case of Lorea/N-1, a Spain-based federation of free social networks originated in 2009. Along with Lorea/N-1's pioneering nature and technical possibilities, the paper concludes by referring to its adoption by the M15 movement and by discussing its current limitations a well as its potential implications. To this end, our research combines bibliographic revision of recent works on the social web and fieldwork within Lorea/N-1 developing group.
Conference Paper
The central idea of Linked Data is that data publishers support applications in discovering and integrating data by complying to a set of best practices in the areas of linking, vocabulary usage, and metadata provision. In 2011, the State of the LOD Cloud report analyzed the adoption of these best practices by linked datasets within different topical domains. The report was based on information that was provided by the dataset publishers themselves via the datahub.io Linked Data catalog. In this paper, we revisit and update the findings of the 2011 State of the LOD Cloud report based on a crawl of the Web of Linked Data conducted in April 2014. We analyze how the adoption of the different best practices has changed and present an overview of the linkage relationships between datasets in the form of an updated LOD cloud diagram, this time not based on information from dataset providers, but on data that can actually be retrieved by a Linked Data crawler. Among others, we find that the number of linked datasets has approximately doubled between 2011 and 2014, that there is increased agreement on common vocabularies for describing certain types of entities, and that provenance and license metadata is still rarely provided by the data sources.
Article
The Linked Data paradigm has enabled a huge shared infrastructure for connecting data from different domains which can be browsed and queried together as a huge knowledge base. However, structured interlinked datasets in this Web of data are not static but continuously evolving, which suggests the investigation of approaches to preserve Linked data across time. In this article, we survey and analyse current techniques addressing the problem of archiving different versions of semantic Web data, with a focus on their space efficiency, the retrieval functionality they serve, and the performance of such operations.
Book
This book explains the Linked Data domain by adopting a bottom-up approach: it introduces the fundamental Semantic Web technologies and building blocks, which are then combined into methodologies and end-to-end examples for publishing datasets as Linked Data, and use cases that harness scholarly information and sensor data. It presents how Linked Data is used for web-scale data integration, information management and search. Special emphasis is given to the publication of Linked Data from relational databases as well as from real-time sensor data streams. The authors also trace the transformation from the document-based World Wide Web into a Web of Data. Materializing the Web of Linked Data is addressed to researchers and professionals studying software technologies, tools and approaches that drive the Linked Data ecosystem, and the Web in general.
Article
The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.
Article
In this article, based on data collected through interviews and a workshop, the benefits and adoption barriers for open data have been derived. The results suggest that a conceptually simplistic view is often adopted with regard to open data, which automatically correlates the publicizing of data with use and benefits. Also, five “myths” concerning open data are presented, which place the expectations within a realistic perspective. Further, the recommendation is provided that such projects should take a user's view.
Article
In developing open data policies, governments aim to stimulate and guide the publication of government data and to gain advantages from its use. Currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research has been done on the issues that are covered by open data policies, their intent and actual impact. Furthermore, no suitable framework for comparing open data policies is available, as open data is a recent phenomenon and is thus in an early stage of development. In order to help bring about a better understanding of the common and differentiating elements in the policies and to identify the factors affecting the variation in policies, this paper develops a framework for comparing open data policies. The framework includes the factors of environment and context, policy content, performance indicators and public values. Using this framework, seven Dutch governmental policies at different government levels are compared. The comparison shows both similarities and differences among open data policies, providing opportunities to learn from each other's policies. The findings suggest that current policies are rather inward looking, open data policies can be improved by collaborating with other organizations, focusing on the impact of the policy, stimulating the use of open data and looking at the need to create a culture in which publicizing data is incorporated in daily working processes. The findings could contribute to the development of new open data policies and the improvement of existing open data policies.
Article
As the amount of data and devices on the Web experiences exponential growth issues on how to integrate such hugely heterogeneous components into a scalable system become increasingly important. REST has proven to be a viable solution for such large-scale information systems. It provides a set of architectural constraints that, when applied as a whole, result in benefits in terms of loose coupling, maintainability, evolvability, and scalability. Unfortunately, some of REST’s constraints such as the ones that demand self-descriptive messages or require the use of hypermedia as the engine of application state are rarely implemented correctly. This results in tightly coupled and thus brittle systems. To solve these and other issues, we present JSON-LD, a community effort to standardize a media type targeted to machine-to-machine communication with inherent hypermedia support and rich semantics. Since JSON-LD is 100\% compatible with traditional JSON, developers can continue to use their existing tools and libraries. As we show in the paper, JSON-LD can be used to build truly RESTful services that, at the same time, integrate the exposed data into the Semantic Web. The required additional design costs are significantly outweighed by the achievable benefits in terms of loose coupling, evolvability, scalability, self-descriptiveness, and maintainability.
Methodologies and software tools
  • N Konstantinou
  • D E Spanos
An efficient and trustworthy P2P and social network integrated file sharing system
  • G Liu
From open data to information justice
  • J A Johnson
  • JA Johnson
An efficient and trustworthy P2P and social network integrated file sharing system
  • G Liu
  • H Shen
  • L Ward