Transparent on-demand co-allocation data access for grids.
ABSTRACT This paper presents a data sharing system called On-Demand data Co-Allocation (ODCA). ODCA integrates the advantages of the co-allocation concepts with the conventional on-demand data access scheme. It transfers the necessary file fragments from multiple data sources only when the application demands, thereby reducing data transmission time, wasted network bandwidth and required storage space. Moreover, it facilitates legacy applications to transparently access distributed grid data by using the native I/O system calls. The experimental results indicate that ODCA achieves superior performance in the turnaround time of data-intensive applications than the pre-staging and the on-demand access schemes.
SourceAvailable from: iisc.ernet.in[Show abstract] [Hide abstract]
ABSTRACT: Grids are being used for executing parallel applications over remote resources. For executing a parallel application on a set of grid resources chosen by a user or a grid scheduler, the input data needed by the application is segmented according to the data distribution followed in the application and the data segments are distributed to the grid resources. The same input data may be used subsequently by different applications leading to multiple copies (replicas) of parallel data segments in various grid resources. The data needed for a parallel application can be gathered from the existing replicas onto the computational resources chosen by the grid scheduler for application execution. In this work, we have devised novel algorithms for determining “nearest” replica sites containing data segments needed by a parallel application executing on a set of resources with the objective of minimizing the time needed for transferring the data segments from the replica sites to the resources. We have tested our algorithms on different kinds of experimental setups. We find that the best algorithm varies according to the configuration of data servers and clients. In all cases, our algorithms performed better than the existing algorithms by at least 15%.Future Generation Computer Systems 07/2008; DOI:10.1016/j.future.2008.01.001 · 2.64 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Data Grids support data-intensive applications in wide area Grid systems. They utilize local storage systems as distributed data stores by replicating datasets. Replication is a commonly used technique in a distributed environment. The motivation of replication is that replication can improve data availability, data access performance, and load balancing. Usually a complete file is copied to many Grid sites for local access. However, a site may only need parts of a replica. Therefore, to use the storage systems efficiently, it is necessary for a Grid site to store only parts of a replica. In this paper, we propose a concept called fragmented replicas. That is, when doing replication, a site can store only some partial contents needed locally. It can greatly save the storage space wasted in storing unused data. We also propose a block mapping procedure to determine the distribution of blocks in every available server for later replica retrieval. According to this procedure, a server can provide its available partial replica contents for other members in the Grid system to access. On the other hand, a client can retrieve a fragmented replica directly by using the block mapping procedure. After the block mapping procedure, some co-allocation schemes can be used to retrieve data sets from the available servers. The simulation shows that the co-allocation schemes also improve download performance in a fragmented replication system.Future Generation Computer Systems 05/2007; DOI:10.1016/j.future.2006.09.006 · 2.64 Impact Factor
Conference Paper: A simple mass storage system for the SRB data grid[Show abstract] [Hide abstract]
ABSTRACT: The functionality that is provided by Mass Storage Systems can be implemented using data grid technology. Data grids already provide many of the required features, including a logical name space and a storage repository abstraction. We demonstrate how management of tape resources can be integrated into data grids. The resulting infrastructure has the ability to manage archival storage of digital entities on tape or other media, while maintaining copies on distributed, remote disk caches that can be accessed through advanced discovery mechanisms. Data grids provide additional levels of data management including the ability to aggregate data into containers before storage on tape, and the ability to migrate collections across a hierarchy of storage devices.Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. 20th IEEE/11th NASA Goddard Conference on; 05/2003