ArticlePDF Available

Abstract

Information discovery is looming as a major challenge with the growth of tera-byte size datagrids. In order to manage their distributed data collections, many scientific organizations are adopting San Diego SuperComputer's Storage Resource Broker (SRB). Indexing and retrieval of data stored in SRB is via SRB's Metadata Catalogue (MCAT). MCAT focuses primarily on system or administrative metadata but supports domain-specific metadata through user-defined extensions. Although this approach provides maximum flexibility, it will lead to interoperability problems when searching across distributed collections described using different user-defined metadata schemas. The aim of the work described in this paper is to semantically augment SRB through an ontology and Resource Description Framework (RDF) descriptions in order to support arbitrary metadata schemata and to enhance the system's search capabilities. In particular we describe a semantic search engine and interface built on top of an OWL ontology, RDF instance data and a Jena reasoning engine that enables easier and more sophisticated searching of heterogeneous data stored using SRB.
A preview of the PDF is not available
... However, the problem of using a grid network and the problem of exploiting ontology resources exists. The first problem pertains to the resource discovery problem [1,2,3]. With the vast resources on the grid network, it becomes fairly difficult to grasp the required resources. ...
Article
Building an ontology resource network using grid computing is one method to exploit the sparsely distributed ontology resources. Grid computing technology provides merit on security issues and ease of controlling the resources. However, problems on resource discovery and the effectiveness in managing and provisioning the various scattered grid resources exist. We addressed these problems with the construction of a semantic grid service which automatically provides the optimal required grid resources, and the adaption of the notion of Virtual Organization (VO). The semantic grid service consists of various VOs, and the sub-ontology extraction and tailoring application was used as a proof-of-concept. The processing of the application was analyzed in order to verify the workability of the system.
Article
We describe the results of the RDF(S) activity within the Open Grid Forum (http://www.ogf.org) (OGF) Database Access and Integration Services (DAIS) Working Group (http://forge.gridforum.org/projects/dais-wg) whose objective is to develop standard service-based grid access mechanisms for data expressed in RDF and RDF Schema. We produce two specifications, focused on the provision of SPARQL querying capabilities for accessing RDF data and a set of RDF Schema ontology handling primitives for creating, retrieving, updating, and deleting RDF data. In this paper we present a set of use cases that justify this work and an overview of these specifications, which will enter in editorial process at OGF25. We conclude by outlining the future work that will be made in the context of this standardization process. Copyright © 2009 John Wiley & Sons, Ltd.
Article
Full-text available
The Fedora architecture is an extensible framework for the storage, management, and dissemination of complex objects and the relationships among them. Fedora accommodates the aggregation of local and distributed content into digital objects and the association of services with objects. This allows an object to have several accessible representations, some of them dynamically produced. The architecture includes a generic Resource Description Framework (RDF)-based relationship model that represents relationships among objects and their components. Queries against these relationships are supported by an RDF triple store. The architecture is implemented as a web service, with all aspects of the complex object architecture and related management functions exposed through REST and SOAP interfaces. The implementation is available as open-source software, providing the foundation for a variety of end-user applications for digital libraries, archives, institutional repositories, and learning object systems.
Conference Paper
Full-text available
There is an increasing need to provide scientists and researchers as well as policy makers and the general public with valueadded services integrating information spread over distributed heterogeneous repositories. In order to incorporate available data sets and scientific programs into a powerful information and computational system it is mandatory to identify and exploit their semantic relationship. For this purpose, we advocate an ontological framework that captures these relations and allows the inference of valid combinations of scientific resources for the production of new data. We show how a knowledge base that commits to an ontology can be used to generate workflows on demand for the multiplicity of resources known to the system. To validate our ideas, we are currently developing a prototype for the area of Coastal Zone Management.
Conference Paper
Full-text available
The ARION system provides basic e-services of search and retrieval of objects in scientific collections, such as, datasets, simulation models and tools necessary for statistical and/or visualization processing. These collections may represent application software of scientific areas, they reside in geographically disperse organizations and constitute the system content. The user may invoke on-line computations of scientific datasets when the latter are not found into the system. Thus, ARION provides the basic infrastructure for accessing and deriving scientific information in an open, distributed and federated system.
Conference Paper
Full-text available
A comprehensive study of the whole petabyte-scale archival data of astronomical observatories has a possibility of new science and new knowledge in the field, while it was not feasible so far due to lack of enough data analysis environment. The Grid Datafarm architecture is designed for global petabyte-scale data-intensive computing, which provides a Grid file system with file replica management for fault tolerance and load balancing, and parallel and distributed data computing support for a set of files, to meet with the requirements of the comprehensive study of the whole archival data. In the paper, we discuss about worldwide parallel and distributed data analysis in the observational astronomical field. The archival data is stored, replicated and dispersed in a Gfarm file system. All the astronomical data analysis tools successfully access files in Gfarm file system without any code modification, using a syscall hooking library regardless of file replica locations. Performance evaluation of the parallel data analysis in several ways shows file-affinity process scheduling plays an essential role for scalable and efficient parallel file I/O performance. A data calibration tools shows scalable file I/O performance, and achieved the file I/O performance of 5.9 GB/sec and 4.0 GB/sec for reading and writing FITS files, respectively, using 30 cluster nodes (60 CPUs). On-demandfile replica creation mitigates the overhead of access concentration. Another tool shows the performance improvement at a factor of six for reading a shared file by creating file replicas
Conference Paper
Full-text available
Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker (SRB) provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a Web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we briefly describe the use of the SRB infrastructure as tools in the data grid architecture for building distributed data collections, digital libraries, and persistent archives. We also provide details about the MySRB and its functionalities.
Conference Paper
The CombeChem project has designed and deployed an e-science infrastructure using a combination of grid and semantic Web technologies. In this paper we describe the datagrid element of the project, which provides a platform for sophisticated scientific queries and a rich record of experimental data and its provenance. This datagrid constitutes a significant deployment of semantic Web technologies and we propose it as an example of a 'semantic datagrid'.
Conference Paper
Traditional hierarchical namespaces are not sufficient for representing and managing the rich semantics of today's storage systems. In this paper, we discuss the principles of semantic-aware file stores. We identify the requirements of applications and end-users and propose to use a generic data model to capture and represent file semantics. A distinct challenge that we face is to handle dynamic evolution of the data schemas. Further, we outline a framework of basic relations and tools for generating and using semantic metadata. The proposed data model and framework are aimed to be more generic and flexible than what is offered by existing semantic file systems. We envision a range of applications and tools that will exploit semantic information, ranging from personal storage systems with features for advanced searching and roaming access, to enterprise systems supporting distributed data location or archiving.