Conference Paper

enabling cross-layer optimizations in storage systems with custom metadata

DOI: 10.1145/1383422.1383451 Conference: Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 23-27 June 2008, Boston, MA, USA
Source: DBLP


Today, several data-storage systems allow applications to create and manage custom metadata to improve data search and navigability in large-scale storage systems. Our thesis is that, besides improving search and navigability, custom metadata can also serve as a two-way communication mechanism between applications and the storage layer to enable cross-layer optimizations in a uniform, application-independent and incremental fashion.

Download full-text


Available from: Samer Al-Kiswany
  • Source
    • "We also explore file annotations with custom metadata to enable communication across layers in storage systems [18]. At a higher level, resource annotation in peer-topeer grids and cross-layer communication through custom metadata in storage systems target the same opportunity: enabling better communication of resource characteristics between users and infrastructure allows the system to optimize for the requirements which matter most to users. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Peer-to-peer grids are large-scale, dynamic environments where autonomous sites share computing resources. Producing and maintaining relevant and up-to-date resource information in such environments is a challenging problem, due to the grid scale, the resource heterogeneity, and the variety of user demand. This work proposes a peer-to-peer annotation approach where users can freely annotate available resources as a solution to this problem. We advocate that the proposed approach (i) is scalable, as the job of updating the resource information is divided among users; (ii) will improve resources' utilization, by reducing the amount of resources which are allocated to users without matching their applications constraints; and (iii) will allow resource allocators to increase users' utility, leveraging access to more detailed preference descriptions. The paper also discusses the challenges in implementing and deploying such approach and present solutions to tackle these challenges.
    Full-text · Conference Paper · Oct 2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
    No preview · Article · Dec 2008 · Proceedings of SPIE - The International Society for Optical Engineering
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we propose, implement and test a novel approach to the management of parallel I/O in high-performance computing. Our proposed approach is built upon three complementary ideas: (i) allowing users to place hints into the application code indicating high-level data access patterns, (ii) enabling an optimizing compiler to process these hints and develop I/O optimization strategies, and (iii) enhancing the I/O stack to accept these optimizations and process them across the different layers in the stack. We describe a general hint processing framework that accommodates this approach and demonstrate its potential by applying it to two sample problems: (i) shared storage cache management and (ii) I/O prefetching. In the former, our approach decides, at each program point of interest, the ideal set of data blocks to keep in shared storage caches in the I/O stack, and in the latter, the high-level data access pattern is propagated from application layer to the parallel file system layer for prefetching data from the storage subsystem. Our approach is designed to complement and work synergistically with the MPI-IO and PVFS frameworks and exploits the characteristics of applications written using these software. We tested our approach using both synthetic data access patterns and disk I/O intensive application programs. The results collected indicate that the proposed approach improves over existing storage caching and I/O prefetching schemes by 28% and 66%, respectively.
    Preview · Conference Paper · Jan 2010
Show more