Conference Paper

S TORAGE DB : Enhancing the Storage Sub-system with DBMS Functionalities

University of California, Santa Barbara
DOI: 10.1109/MSST.2005.34 Conference: Mass Storage Systems and Technologies, 2005. Proceedings. 22nd IEEE / 13th NASA Goddard Conference on
Source: DBLP


This paper proposes STORAGEDB: a paradigm for implementing storage virtualation using databases. It describes details for storing the logical-to-physical mapping information as tables within the database; handling the incoming I/O requests of the application as database queries; bookkeeping of the I/O operations as database transactions. In addition, STORAGEDB uses built-in DBMS features to support storage virtualization functionalities; as an example we describe how online table space migration can be used to support reallocation of logical disks. Finally, we describe our modifications to a traditional RDBMS implementation, in order to make it light-weight. Improving the performance of a traditional DBMS is critical for the acceptance of STORAGEDB since the performace overheads are considered a primary challenge in replacing existing storage virtualization engines. Our current lightweight RDBMS has a 19 times shorter invocation path length than the original. In comparision to the open-source virtualization software, namely LVM, the extra cost of STORAGEDB is within 20% of LVM in trace-driven tests. (unlike STORAGEDB, LVMdid not have logging overhead). We consider these initial results as the "stepping stone" in the paradigm of applying databases for storage virtualization.

1 Read
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Switched system-area networks enable thousands of storage devices to be shared and directly accessed by end hosts, promising databases and file systems highly scalable, reliable storage. In such systems, hosts perform access tasks (read and write) and management tasks (storage migration and reconstruction of data on failed devices.) Each task translates into multiple phases of low-level device I/Os, so that concurrent host tasks accessing shared devices can corrupt redundancy codes and cause hosts to read inconsistent data. Concurrent control protocols that scale to large system sizes are required in order to coordinate on-line storage management and access tasks. In this paper we identify, the tasks that storage controllers must perform, and propose an approach which allows these tasks to be composed from basic operations-called base storage transactions (BSTs)-such that correctness requires only the serializability of the BSTs and not of the parent tasks. We present highly scalable distributed protocols which exploit storage technology trends and BST properties to achieve serializability while coming within a few percent of ideal performance
    Distributed Computing Systems, 2000. Proceedings. 20th International Conference on; 02/2000
  • [Show abstract] [Hide abstract]
    ABSTRACT: Atomic recovery units (ARUs) are a mechanism that allows several logical disk operations to be executed as a single atomic unit with respect to failures. For example, ARUs can be used during file creation to update several pieces of file meta-data atomically. ARUs simplify systems, as they isolate issues of atomicity within the logical disk system, ARUs are designed as part of the Logical Disk (LD), which provides an interface to disk storage that separates file and disk management by using logical block numbers and block lists. This paper discusses the semantics of concurrent ARUs, as well as the concurrency control they require. A prototype implementation in a log-structured logical disk system is presented and evaluated. The performance evaluation shows that the run-time overhead to support concurrent ARUs is negligible for Read and Write operations, and small but pronounced for file creation (4.0%-7.2%) and deletion (17.9%-20.5%) which mainly manipulate meta-data. The low overhead (when averaged over file creation, writing, reading, and deletion) for concurrent ARUs shows that issues of atomicity can be successfully isolated within the disk system
    Distributed Computing Systems, 1996., Proceedings of the 16th International Conference on; 06/1996
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, and to only uncompress data when absolutely necessary. We will show that many query processing algorithms can manipulate compressed data just as well as decompressed data, and that processing compressed data can speed query processing by a factor much larger than the compression factor.