R. Appuswamy

R. Appuswamy
EURECOM · Data Science Department

About

53
Publications
6,381
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
484
Citations
Citations since 2017
30 Research Items
331 Citations
20172018201920202021202220230204060
20172018201920202021202220230204060
20172018201920202021202220230204060
20172018201920202021202220230204060

Publications

Publications (53)
Preprint
Full-text available
DNA is a promising candidate for long-term data storage due to its high density and endurance. The key challenge in DNA storage today is the cost of synthesis. In this work, we propose composite motifs , a frame-work that uses a mixture of prefabricated motifs as building blocks to reduce synthesis cost by scaling logical density. To write data, we...
Chapter
The growing adoption of AI and data analytics in various sectors has resulted in digital preservation emerging as a cross-sectoral problem that affects everyone from data-driven enterprises to memory institutions alike. As all contemporary storage media suffer from fundamental density and durability limitations, researchers have started investigati...
Preprint
Full-text available
The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Today, the limiting factor for DNA-based data archival is the cost of writing (synthesis) and reading (sequencing) DNA. Newer techniques tha...
Article
"How would you archive databases for the next 60 years such that they incur no migration cost, and they remain usable in 2080?" This was an open challenge raised by digital preservation experts from the Landesarchiv of Baden-W¨urttemberg [12], who, similar to other memory institutions (archives, museums, libraries, etc.), have faced several challen...
Article
In the last few years, SIGMOD and VLDB have intensified efforts to encourage, facilitate, and establish reproducibility as a key process for accepted research papers, awarding them with the Reproducibility badge. In addition, complementary efforts have focused on increasing the sharing of accompanying artifacts of published work (code, scripts, dat...
Article
Full-text available
Background Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed–filter–extend...
Article
Living in the age of data explosion, the research of solutions for efficient long term storage of the infrequently used ”cold” data is becoming of great interest. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash, tape or even optical storage have limited lifespan in the rang...
Preprint
Research on alternate media technologies, like film, synthetic DNA, and glass, for long-term data archival has received a lot of attention recently due to the media obsolescence issues faced by contemporary storage media like tape, Hard Disk Drives (HDD), and Solid State Disks (SSD). While researchers have developed novel layout and encoding techni...
Conference Paper
Full-text available
Online Transaction Processing (OLTP) deployments are migrating from on-premise to cloud settings in order to exploit the elasticity of cloud infrastructure which allows them to adapt to workload variations. However, cloud adaptation comes at the cost of redesigning the engine, which has led to the introduction of several, new, cloud-based transacti...
Preprint
Full-text available
Motivation: Improvements in sequencing technology continue to drive sequencing cost towards 100$ per genome. However, mapping sequenced data to a reference genome remains a computationally intensive task due to the dependence on edit distance for dealing with indels and mismatches introduced by sequencing. All modern aligners use seed-filter extend...
Article
Full-text available
Tracing the evolution of the five-minute rule to help identify imminent changes in the design of data management engines.
Conference Paper
Full-text available
Living in the age of data explosion, the research of solutions for efficient long term storage of the infrequently used "cold" data is becoming of great interest. Recent studies have proven that due to its biological properties, the DNA is a strong candidate for the storage of digital information allowing also data longevity. However the biological...
Conference Paper
The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics...
Preprint
Full-text available
The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics...
Preprint
Full-text available
Living in the age of the digital media explosion, the amount of data that is being stored increases dramatically. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash, tape or even optical storage have limited lifespan in the range of 5 to 20 years. Interestingly, recent studies...
Article
Modern server hardware is increasingly heterogeneous as hardware accelerators, such as GPUs, are used together with multicore CPUs to meet the computational demands of modern data analytics work-loads. Unfortunately, query parallelization techniques used by analytical database engines are designed for homogeneous multicore servers, where query plan...
Preprint
Full-text available
Rapid advances in sequencing technologies are producing genomic data on an unprecedented scale. The first, and often one of the most time consuming, step of genomic data analysis is sequence alignment, where sequenced reads must be aligned to a reference genome. Several years of research on alignment algorithms has led to the development of several...
Article
Main-memory OLTP engines are being increasingly deployed on multicore servers that provide abundant thread-level parallelism. However, recent research has shown that even the state-of-the-art OLTP engines are unable to exploit available parallelism for high contention workloads. While previous studies have shown the lack of scalability of all popul...
Article
Main-memory OLTP engines are being increasingly deployed on multicore servers that provide abundant thread-level parallelism. However, recent research has shown that even the state-of-the-art OLTP engines are unable to exploit available parallelism for high contention workloads. While previous studies have shown the lack of scalability of all popul...
Conference Paper
Full-text available
Data loading has traditionally been considered a “one-time deal” – an offline process out of the critical path of query execution. The architecture of DBMS is aligned with this assumption. Nevertheless, the rate in which data is produced and gathered nowadays has nullified the “one-off” assumption, and has turned data loading into a major bottlenec...
Article
Full-text available
Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSD-based high-performance tier when it is "hot" (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when "cold" (rarely accessed). To address the unprecedented growth in the amount...
Conference Paper
Although scaling out of low-power cores is an alternative to power-hungry Intel Xeon processors for reducing the power overheads, they have proven inadequate for complex, non-parallelizable workloads. On the other hand, by the introduction of the 64-bit ARMv8 architecture, traditionally low power ARM processors have become powerful enough to run co...
Conference Paper
Multisocket multicores feature hardware islands - groups of cores that communicate fast among themselves and slower with other groups. With high speed networking becoming a commodity, clusters of hardware islands with fast networks are becoming a preferred platform for high end OLTP workloads. While behavior of OLTP on multisockets is well understo...
Conference Paper
Improving the energy efficiency of database systems has emerged as an important topic of research over the past few years. While significant attention has been paid to optimizing the power consumption of tradition disk-based databases, little attention has been paid to the growing cost of DRAM power consumption in main-memory databases (MMDB). In t...
Article
Full-text available
The operating system storage stack is an important software component, but it faces several reliability threats. The research community has come up with many solutions to address individual parts of this reliability problem. However, when considering the bigger picture of constructing a highly reliable storage stack out of these individual solution...
Article
In recent times, two virtualization approaches have become dominant: hardware-level and operating system-level virtualization. They differ by where they draw the virtualization boundary between the virtualizing and the virtualized part of the system, resulting in vastly different properties. We argue that these two approaches are extremes in a cont...
Conference Paper
In this paper, we describe the emerging concept of namespace modules: operating system components that are responsible for constructing a hierarchical file system namespace based on one or more individual underlying file objects. We show that the likely presence of software bugs in such modules calls for the ability to recover from crashes, but tha...
Conference Paper
As enterprises shift from using direct-attached storage to network-based storage for housing primary data, flash-based, host-side caching has gained momentum as the primary latency reduction technique. In this paper, we make the case for integration of flash caching algorithms at the file level, as opposed to the conventional block-level integratio...
Conference Paper
Full-text available
In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up by adding more resources to a single server. Popular analytics infrastructures such as Hadoop ar...
Conference Paper
Several multilevel storage systems have been designed over the past few years that utilize RAM and flash-based SSDs in concert to cache data resident in HDD-based primary storage. The low cost/GB and non-volatility of SSDs relative to RAM have encouraged storage system designers to adopt inclusivity (between RAM and SSD) in the caching hierarchy. H...
Conference Paper
In this paper, we aim to improve the reliability of a central part of the operating system storage stack: the page cache. We consider two reliability threats: memory errors, where bits in DRAM are flipped due to cosmic rays, and software bugs, where programming errors may ultimately result in data corruption and crashes. We argue that by making use...
Conference Paper
Full-text available
In this paper, we look at two important failure classes in the storage stack: system crashes, where the whole system shuts down unexpectedly, and process crashes, where a part of the storage stack software fails due to an implementation bug. We investigate these two problems in the context of the Loris storage stack. We show how restoring metadata...
Conference Paper
Full-text available
Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with high-density HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs...
Conference Paper
Full-text available
With the amount of data increasing at an alarming rate, domain-specific user-level metadata management systems have emerged in several application areas to compensate for the shortcomings of file systems. Such systems provide domain-specific storage formats for performance-optimized metadata storage, search-based access interfaces for enabling decl...
Conference Paper
Traditional file systems made it possible for administrators to create file volumes, on a one-file-volume-per-disk basis. With the advent of RAID algorithms and their integration at the block level, this “one file volume per disk” bond forced administrators to create a single, shared file volume across all users to maximize storage efficiency, ther...
Conference Paper
Full-text available
The storage stack in an operating system faces a number of dependability threats. The importance of keeping users' data safe makes this area particularly worth investigating. We briefly describe the main threats (disk device failures, whole-system failures, software bugs, and memory corruption), and outline the Loris storage stack that we developed...
Conference Paper
The arrangement of file systems and volume management/RAID systems, together commonly referred to as the storage stack, has remained the same for several decades, despite significant changes in hardware, software and usage scenarios. In this paper, we evaluate the traditional storage stack along three dimensions: reliability, heterogeneity and flex...
Article
Full-text available
The common storage stack as found in most operating systems has remained unchanged for several decades. In this stack, the RAID layer operates under the file system layer, at the block abstraction level. We argue that this arrangement of layers has fatal flaws. In this paper, we highlight its main problems, and present a new storage stack arrangeme...
Conference Paper
This work augments MINIX 3's failure-resilience mechanisms with novel disk-driver recovery strategies and guaranteed file-system data integrity. We propose a flexible filter-driver framework that operates transparently to both the file system and the disk driver and enforces different protection strategies. The filter uses checksumming and mirrorin...
Conference Paper
Full-text available
The Integrated Situational Awareness System (ISAS) initiative at the University of Florida Digital Worlds Institute has demonstrated an effective web services-enhanced graphically-based environment for globally-distributed operations ranging from humanitarian aid during large-scale environmental disasters to high-level collaboration and augmented d...
Article
Full-text available
With the widespread adoption of NAND-flash-based SSDs as a primary storage medium, heterogeneity in storage in-stallations has become a norm rather than an exception. In this paper, we argue that the compatibility-driven traditional storage stack is fundamentally flawed and incapable of sup-porting and exploiting heterogeneity properly. After high-...
Article
The common storage stack as found in most oper-ating systems has remained unchanged for several decades. In this stack, the RAID layer operates under the file system layer, at the block abstrac-tion level. We argue that this arrangement of lay-ers has fatal flaws. In this paper, we highlight its main problems, and present a new storage stack arrang...

Network

Cited By