About
139
Publications
26,355
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,524
Citations
Publications
Publications (139)
We present a thorough analysis of the use of CXL-based heterogeneous systems. We built a cluster of server systems that combines different vendor's CPUs and various types of CXL devices. We further developed a heterogeneous memory benchmark suite, Heimdall, to profile the performance of such heterogeneous systems. By leveraging Heimdall, we unveile...
Emerging fast, byte-addressable persistent memory (PM) promises substantial storage performance gains compared to traditional disks. We present TPFS, a tiered file system that combines PM and slow disks to create a storage system with near-PM performance and large capacity. TPFS steers incoming file I/O to PM, DRAM, or disk depending on the synchro...
Scalable server-grade non-volatile RAM (NVRAM) DIMMs are commercially available with the release of Intel's Optane DIMM. Recent studies on Optane DIMM-based systems unveil discrepant performance characteristics, compared with what many researchers thought before the product release. To thoroughly analyze the source of the discrepancy and facilitate...
After nearly a decade of anticipation, scalable nonvolatile memory DIMMs are finally commercially available with the release of Intel's 3D XPoint DIMM. This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. Researchers have not idly waited f...
In systems with non-volatile main memories (NVMMs), programmers must carefully control the order in which writes become persistent. Otherwise, what will remain in persistence after a crash may be unusable upon recovery. Prior art has already explored semantic models for specifying this persist order, but most enforcement algorithms for the order ar...
Custom mechatronic devices offer personalized functionality, but also come with many non-functional requirements that are unfamiliar to those inexperienced with electronics such as current draw and servo power. The Echidna prototype system enables non-electrical engineers to move from conception to implementation with their mechatronic ideas by gen...
Programmable software-defined solid-state drives can move computing functions closer to storage.
Web programming technologies such as HTML, JavaScript, and CSS have become a popular choice for user interface design due to their capabilities: flexible interface, first-class networking, and available libraries. In parallel, driven by the standards set by the mobile companies, embedded devices manufacturers now want to replicate these capabilitie...
Emerging fast, non-volatile memories will enable systems with large amounts of non-volatile main memory (NVMM) attached to the CPU memory bus, bringing the possibility of dramatic performance gains for IO-intensive applications. This paper analyzes the impact of state-of-the-art NVMM storage systems on some of these applications and explores how th...
Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Intel Optane DC Persistent Memory Module (or just "Optane DC PMM"). This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. This work comprises t...
We describe our experience teaching an intensive capstone course in which pairs of students build the hardware and software for a remote-controller quad-rotor aircraft (i.e., a quadcopter or "drone'') from scratch in one 10-week quarter. The course covers printed circuit board (PCB) design and assembly, basic control theory and sensor fusion, and e...
Physical computing and building robots has important benefits for novice engineers and computer scientists. However, lab time and hardware debugging comes with a high cost of instructor time and effort. To reduce this workload, we implemented a computational design tool that simplifies printed circuit board (PCB) design and manufacture, assembly, a...
We describe our experience teaching an intensive capstone course in which pairs of students build the hardware and software for a remote-controller quad-rotor aircraft (i.e., a quadcopter or "drone") from scratch in one 10-week quarter. The course covers printed circuit board (PCB) design and assembly, basic control theory and sensor fusion, and em...
In modern computing systems, object deserialization can become a surprisingly important bottleneck-in our test, a set of generalpurpose, highly parallelized applications spends 64% of total execution time deserializing data into objects. This paper presents the Morpheus model, which allows applications to move such computations to a storage device...
Storage systems in data centers are an important component of large-scale online services. They typically perform replicated transactional operations for high data availability and integrity. Today, however, such operations suffer from high tail latency even with recent kernel bypass and storage optimizations, and thus affect the predictability of...
Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query proces...
Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query proces...
In the year of 2017, the capital expenditure of Flash-based Solid State Drivers (SSDs) keeps declining and the storage capacity of SSDs keeps increasing. As a result, the "selling point" of traditional spinning Hard Disk Drives (HDDs) as a backend storage - low cost and large capacity - is no longer unique, and eventually they will be replaced by l...
Emerging fast, persistent memories will enable systems that combine conventional DRAM with large amounts of non-volatile main memory (NVMM) and provide huge increases in storage performance. Fully realizing this potential requires fundamental changes in how system software manages, protects, and provides access to data that resides in NVMM. We addr...
Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform applica...
Existing solid state drives (SSDs) provide flash-based out-of-band (OOB) data that can only be updated on a page write. Consequently, the metadata stored in their OOB region lack flexibility due to the idiosyncrasies of flash memory, incurring unnecessary flash write operations detrimental to device lifetime.
We propose PebbleSSD, an SSD with byte-...
Bitmap compression has been studied extensively in the database area and many efficient compression schemes were proposed, e.g., BBC, WAH, EWAH, and Roaring. Inverted list compression is also a well-studied topic in the information retrieval community and many inverted list compression algorithms were developed as well, e.g., VB, PforDelta, GroupVB...
Data structures for non-volatile memories have to be designed such that they can be atomically modified using transactions. Existing atomicity methods require data to be copied in the critical path which significantly increases the latency of transactions. These overheads are further amplified for transactions on byte-addressable persistent memorie...
Inverted list compression is a topic that has been studied for 50 years due to its fundamental importance in numerous applications including information retrieval, databases, and graph analytics. Typically, an inverted list compression algorithm is evaluated on its space overhead and query processing time. Earlier list compression designs mainly fo...
As data sets grow and conventional processor performance scaling slows, data analytics move towards heterogeneous architectures that incorporate hardware accelerators (notably GPUs) to continue scaling performance. However, existing GPU-based databases fail to deal with big data applications efficiently: their execution model suffers from scalabili...
SSD-based in-storage computing (called ”Smart SSDs”) allows application-specific codes to execute inside SSDs to exploit the high internal bandwidth and energy-efficient processors. As a result, Smart SSDs have been successfully deployed in many industry settings, e.g., Samsung, IBM, Teradata, and Oracle. Moreover, researchers have also demonstrate...
Recently, there has been a renewed interest of in-storage computing in the context of solid state drives (SSDs), called "Smart SSDs." Smart SSDs allow application-specific code to execute inside SSDs. This allows applications to take advantage of the high internal bandwidth that Smart SSDs provide. This work studies the offloading of list intersect...
In high performance computing systems, object deserialization can become a surprisingly important bottleneck---in our test, a set of general-purpose, highly parallelized applications spends 64% of total execution time deserializing data into objects.
This paper presents the Morpheus model, which allows applications to move such computations to a st...
We present a survey of non-volatile memory technology papers published between 2000 and 2014 in leading journals and conference proceedings in the area of integrated circuit design and semiconductor devices. We present a summary of the data provided in these papers and use that data to model basic aspects of their performance at an architectural le...
Next-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage systems. These high-performance storage systems would be especially useful in large-scale data center en...
Next-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage systems. These high-performance storage systems would be especially useful in large-scale data center en...
We present the design of AppNVM, a software-defined, application-driven solid state drive (SSD) inspired by Software-Defined Networking. AppNVM exposes an application-defined interface without sacrificing performance by separating the data plane from the control plane. Applications control App-NVM SSDs by installing rules, which define (i) the logi...
Systems and methods of storage device access are provided, where the operating system copies permission and mapping information to the storage array and/or to the application program's memory. The application program can then access the storage device without the operating system's intervention and the storage device will check whether the applicat...
We explore the potential of making programmability a central feature of the SSD interface. Our prototype system, called Willow, allows programmers to augment and extend the semantics of an SSD with application specific features without compromising file system protections. The SSD Apps running on Willow give application logic low-latency, high-band...
The cost of data movement in big-data systems motivates careful examination of near-data processing (NDP) frameworks. The concept of NDP was actively researched in the 1990s, but gained little commercial traction. After a decade-long dormancy, interest in this topic has spiked. A workshop on NDP was organized at MICRO-46 and was well attended. Give...
As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viable means of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and thi...
Emerging non-volatile storage (e.g., Phase Change Memory, STTRAM) allow access to persistent data at latencies an order of magnitude lower than SSDs. The density and price gap between NVMs and denser storage make NVM economically most suitable as a cache for larger, more conventional storage (i.e., NAND flashbased SSDs and disks). Existing storage...
Phase Change Memory (PCM) presents an architectural challenge: writing to it is slow enough to make attaching it to a CPU's main memory controller impractical, yet reading from it is so fast that using it in a peripheral storage device would leave much of its performance potential untapped at low command queue depths, throttled by the high latencie...
Secure digital cards and embedded multimedia cards are pervasively used as secondary storage devices in portable electronics, such as smartphones and tablets. These devices cost under 70 cents per gigabyte. They deliver more than 4000 random IOPS and 70 MBps of sequential access bandwidth. Additionally, they operate at a peak power lower than 250 m...
A dataflow instruction set architecture and execution model, referred to as WaveScalar, which is designed for scalable, low-complexity/high-performance processors, while efficiently providing traditional memory semantics through a mechanism called wave-ordered memory. Wave-ordered memory enables “real-world” programs, written in any language, to be...
Accelerating a single thread in current parallel systems remains a challenging problem, because sequential threads do not naturally take advantage of the additional cores. Recent work shows that automatic extraction of pipeline parallelism is an effective way to speed up single thread execution. However, two problems remain challenging - load balan...
Transaction-based systems often rely on write-ahead logging (WAL) algorithms designed to maximize performance on disk-based storage. However, emerging fast, byte-addressable, non-volatile memory (NVM) technologies (e.g., phase-change memories, spin-transfer torque MRAMs, and the memristor) present very different performance characteristics, so blit...
Emerging non-volatile storage (e.g., Phase Change Memory, STT-RAM) allow access to persistent data at latencies an order of magnitude lower than SSDs. The density and price gap between NVMs and denser storage make NVM economically most suitable as a cache for larger, more conventional storage (i.e., NAND flash-based SSDs and disks). Existing storag...
This introduction to the special issue discusses"dark silicon"--chip area that must remain unclocked or underclocked--and highlights the five articles in the issue.
Emerging nonvolatile storage technologies promise orders-of-magnitude bandwidth increases and latency reductions, but fully realizing their potential requires minimizing storage software overhead and rethinking the roles of hardware and software in storage systems. The Web extra at http://youtu.be/pAALQ6k-CbE is an audio interview in which guest ed...
Recent years have witnessed significant gains in the adoption of flash technology due to increases in bit density, enabling higher capacities and lower prices. Unfortunately, these improvements come at a significant cost to performance with trends pointing toward worst-case flash program latencies on par with disk writes. We extend a conventional f...
Solid State Disks (SSDs) based on flash and other non-volatile memory technologies reduce storage latencies from 10s of milliseconds to 10s or 100s of microseconds, transforming previously inconsequential storage overheads into performance bottlenecks. This problem is especially acute in storage area network (SAN) environments where complex hardwar...
MLC Flash memory is getting more popular in computer systems ranging from sensor networks and embedded systems to large-scale server systems. However, MLC flash has many reliability concerns, including the potential for corruption due to supply voltage fluctuations. This paper characterizes MLC flash when the chip is underpowered (i.e., power fadin...
The data-intensive applications that will shape computing in the coming decades require scalable architectures that incorporate scalable data and compute resources and can support random requests to unstructured (e.g., logs) and semi-structured (e.g., large graph, XML) data sets. To explore the suitability of FPGAs for these computations, we are co...
Emerging non-volatile memory (NVM) technologies have DRAM-like latency with storage-like density, offering unique capability to analyze large data sets significantly faster than flash or disk storage. However, the hybrid nature of these NVM technologies such as phase-change memory (PCM) make it difficult to use them to best advantage in the memory-...
Microelectronic circuits exhibit increasing variations in performance, power consumption, and reliability parameters across the manufactured parts and across use of these parts over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure yield and reliability with respect to a rigid s...
The data-intensive applications that will shape computing in the coming decades require scalable architectures that incorporate scalable data and compute resources and can support unstructured (e.g., logs) and semi-structured (e.g., large graph, XML) data sets. To explore the suitability of FPGAs for these computations, we are constructing an FPGA-...
Non-traditional parallelism provides parallel speedup for a single thread without the need to manually divide and coordinate computation. This paper describes coalition threading, a technique that seeks the ideal combination of traditional and non-traditional threading to make the best use of available hardware parallelism. Coalition threading prov...
Power consumption is a concern for helper-thread prefetching that uses extra cores to speed up the single-thread execution, because power consumption increases with each additional core. This article analyzes the impact of using power-saving techniques in the context of intercore prefetching (ICP), and shows that dynamic frequency scaling coupled w...
Flash memory is a promising new storage technology. To fully utilize future multi-level cell Flash memories, it is necessary to develop error correction coding schemes attuned to the underlying physical characteristics of Flash. Based on a careful inspection of fine-grained, experimentally-collected error patterns of TLC (three bits per cell) Flash...
Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow pr...
Emerging fast, non-volatile memories (e.g., phase change memories, spin-torque MRAMs, and the memristor) reduce storage access latencies by an order of magnitude compared to state-of-the-art flash-based SSDs. This improved performance means that software overheads that had little impact on the performance of flash-based systems can present serious...
In recent years, flash-based SSDs have grown enormously both in capacity and popularity. In highperformance enterprise storage applications, accelerating adoption of SSDs is predicated on the ability of manufacturers to deliver performance that far exceeds disks while closing the gap in cost per gigabyte. However, while flash density continues to i...
Emerging fast, non-volatile memories (e.g., phase change memo-ries, spin-torque MRAMs, and the memristor) reduce storage ac-cess latencies by an order of magnitude compared to state-of-the-art flash-based SSDs. This improved performance means that software overheads that had little impact on the performance of flash-based systems can present seriou...
The Dark Silicon Age kicked off with the transition to multicore and will be characterized by a wild chase for seemingly ever-more insane architectural designs. At the heart of this transformation is the Utilization Wall, which states that, with each new process generation, the percentage of transistors that a chip can switch at full frequency is d...
Flash memory has become the storage medium of choice in portable consumer electronic applications, and high performance solid state drives (SSDs) are also being introduced into mobile computing, enterprise storage, data warehousing, and data-intensive computing systems. On the other hand, flash memory technologies present major challenges in the ar...
Transistor density continues to increase exponentially, but power dissipation per transistor is improving only slightly with each generation of Moore's law. Given the constant chip-level power budgets, this exponentially decreases the percentage of transistors that can switch at full frequency with each technology generation. Hence, while the trans...
As the complexity of FPGA-based systems scales, the importance of efficiently handling irregular code increases. Recent work has proposed Irregular Code Energy Reducers (ICERs), a high-level synthesis approach for FPGAs that offers significant energy reduction for irregular code compared to a soft core processor. ICERs target the hot-spots of progr...
We evaluate seven techniques for extracting unique signatures from NAND flash devices based on observable effects of process
variation. Four of the techniques yield usable signatures that represent different trade-offs between speed, robustness, randomness,
and wear imposed on the flash device. We describe how to use the signatures to prevent count...
We describe a prototype high-performance solid-state drive based on first-generation phase-change memory (PCM) devices called Onyx. Onyx has a capacity of 10 GB and connects to the host system via PCIe. We describe the internal architecture of Onyx including the PCM memory modules we constructed and the FPGA-based controller that manages them. Onyx...
Flash memory is quickly becoming a common component in computer systems ranging from music players to mission-critical server systems. As flash plays a more important role, data integrity in flash memories becomes a critical question. This paper examines one aspect of that data integrity by measuring the types of errors that occur when power fails...
This paper describes an architecture and FPGA synthesis tool chain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide range of unmodified C programs. FPGAs are increasingly used to build large-scale systems, and many large software systems contain relatively little code that is amenable to...
Mobile application processors are soon to replace desktop processors as the focus of innovation in microprocessor technology. Already, these processors have largely caught up to their more power hungry cousins, supporting out-oforder execution and multicore processing. In the near future, the exponentially worsening problem of dark silicon is going...
Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow pr...
Reliably erasing data from storage media (sanitizing the media) is a critical component of secure data management. While sanitizing entire disks and individual files is well-understood for hard drives, flash-based solid state disks have a very different internal architecture, so it is unclear whether hard drive techniques will work for SSDs as well...
Multicore processors have become ubiquitous in today's systems, but exploiting the parallelism they offer remains difficult, especially for legacy application and applications with large serial components. The challenge, then, is to develop techniques that allow multiple cores to work in concert to accelerate a single thread. This paper describes i...
Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow pr...
Multicore processors have become ubiquitous in today's systems, but exploiting the parallelism they offer remains difficult, especially for legacy application and applications with large serial components. The challenge, then, is to develop techniques that allow multiple cores to work in concert to accelerate a single thread. This paper describes i...
Complex “fat operators” are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay,...
This article discusses about Greendroid mobile Application Processor. Dark silicon has emerged as the fundamental limiter in modern processor design. The Greendroid mobile application processor demonstrates an approach that uses dark silicon to execute general-purpose smart phone applications with less energy than today's most energy efficient desi...
Single thread performance remains an important consideration even for multicore, multiprocessor systems. As a result, techniques for improving single thread performance using multiple cores have received considerable attention. This work describes a technique, software data spreading, that leverages the cache capacity of extra cores and extra socke...
Emerging non-volatile memory technologies such as phase change memory (PCM) promise to increase storage system performance by a wide margin relative to both conventional disks and flash-based SSDs. Realizing this potential will require significant changes to the way systems interact with storage devices as well as a rethinking of the storage device...
Non-volatile memories (such as NAND flash and phase change memories) have the potential to revolutionize computer systems. However, these technologies have complex behavior in terms of performance, reliability, and energy consumption that make fully exploiting their potential a complicated task. As device e