About
36
Publications
9,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
846
Citations
Introduction
Skills and Expertise
Publications
Publications (36)
Flash memory technologies rely on the flash translation layer (FTL) to manage no in-place update and garbage collection. Current FTL management schemes do not exploit the semantics of the accessed data. In this paper, we explore how semantic knowledge can be exploited to build and maintain indexes for stored data automatically. Data indexing is a c...
The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportuniti...
Varifocal storage (VS) presents a new architecture that coordinates application demands, hardware accelerators, and intelligent data storage devices to efficiently support various input resolutions of system components, but still maintain the flexibility and quality without additional costs. Instead of faithfully shipping the raw data, the cross-la...
Approximate computing that works on less precise data leads to significant performance gains and energy-cost reductions for compute kernels. However, without leveraging the full-stack design of computer systems, modern computer architectures undermine the potential of approximate computing.
In this paper, we present Varifocal Storage, a dynamic mul...
Graph analytics play a key role in a number of applications such as social networks, drug discovery, and recommendation systems. Given the large size of graphs that may exceed the capacity of the main memory, application performance is bounded by storage access time. Out-of-core graph processing frameworks try to tackle this storage access bottlene...
In modern computing systems, object deserialization can become a surprisingly important bottleneck-in our test, a set of generalpurpose, highly parallelized applications spends 64% of total execution time deserializing data into objects. This paper presents the Morpheus model, which allows applications to move such computations to a storage device...
Modern data center solid state drives (SSDs) integrate multiple general-purpose embedded cores to manage flash translation layer, garbage collection, wear-leveling, and etc., to improve the performance and the reliability of SSDs. As the performance of these cores steadily improves there are opportunities to repurpose these cores to perform applica...
Existing solid state drives (SSDs) provide flash-based out-of-band (OOB) data that can only be updated on a page write. Consequently, the metadata stored in their OOB region lack flexibility due to the idiosyncrasies of flash memory, incurring unnecessary flash write operations detrimental to device lifetime.
We propose PebbleSSD, an SSD with byte-...
As data sets grow and conventional processor performance scaling slows, data analytics move towards heterogeneous architectures that incorporate hardware accelerators (notably GPUs) to continue scaling performance. However, existing GPU-based databases fail to deal with big data applications efficiently: their execution model suffers from scalabili...
In high performance computing systems, object deserialization can become a surprisingly important bottleneck---in our test, a set of general-purpose, highly parallelized applications spends 64% of total execution time deserializing data into objects.
This paper presents the Morpheus model, which allows applications to move such computations to a st...
This paper presents CDTT, a compiler framework that takes C/C++ code and automatically generates a binary that eliminates dynamically redundant code without programmer intervention. It does so by exploiting underlying hardware or software support for the data-triggered threads (DTT) programming and execution model. With the help of idempotence anal...
Many studies have demonstrated that students tend to learn less than instructors expect in CS1. In light of these studies, a natural question is: to what extent do these results hold for subsequent, upper-division computer science courses? In this paper we describe our work in creating high-level concept questions for an upper-division computer arc...
MLC Flash memory is getting more popular in computer systems ranging from sensor networks and embedded systems to large-scale server systems. However, MLC flash has many reliability concerns, including the potential for corruption due to supply voltage fluctuations. This paper characterizes MLC flash when the chip is underpowered (i.e., power fadin...
The data-triggered threads (DTT) programming and execution model can increase parallelism and eliminate redundant computation. However, the initial proposal requires significant architecture support, which impedes existing applications and architectures from taking advantage of this model. This work proposes a pure software solution that supports t...
Unlike threads in parallel programs created by conventional programming, data-triggered threads are initiated when a memory value is changed. By expressing computation through these threads, computation is executed only when the data changes and is skipped whenever the data does not change. The authors' model achieves performance speedups of up to...
Flash memory is quickly becoming a common component in computer systems ranging from music players to mission-critical server systems. As flash plays a more important role, data integrity in flash memories becomes a critical question. This paper examines one aspect of that data integrity by measuring the types of errors that occur when power fails...
This paper introduces the concept of data-triggered threads. Unlike threads in parallel programs in conventional programming models, these threads are initiated on a change to a memory location. This enables increased parallelism and the elimination of redundant, unnecessary computation. This paper focuses primarily on the latter. It is shown that...
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, reliability and much lower power consumption than mechanical hard drives. The characteristics of fla...
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, reliability and much lower power consumption than mechanical hard drives. The NAND flash memory is o...
Power consumption is an important design issue of current embedded systems. It has been shown that the instruction cache accounts for a significant portion of the power dissipation of the whole chip. Data caches also consume a significant portion of total processor power for multimedia applications because they are data intensive. In this paper, we...
Power consumption is an important design issue of current multimedia embedded systems. Data caches consume a significant portion of total processor power for multimedia applications because they are data intensive. In an integrated multimedia system, the cache architecture cannot be tuned specifically for an application. Therefore, a significant am...
In this paper, we propose U-MAC, a medium access control protocol designed for wireless sensor networks. Nowadays, wireless sensor network are formed by a great quantity of sensor nodes, which are generally battery-powered and may not recharge easily. Consequently, how to prolong the lifetime of the nodes is an important issue while designing a MAC...
Prefetching is often used to overlap memory latency with computation for array-based applications. However, prefetching for pointer-intensive applications remains a challenge because of the irregular memory access pattern and pointer-chasing problem. In this paper, we proposed a cooperative hardware/software prefetching framework, the push architec...
A wireless sensor network is composed of a great number of sensing devices equipped with limited power sources. Under such battery-based constraints with limited power supply, energy consumption is a key issue for designing sensor network applications. Some sensor products adopt an IEEE 802.11-like MAC protocol. However, IEEE 802.11 MAC is not a go...
Power consumption is an important design issue of current embedded systems. Data caches consume a significant portion of total processor power for data intensive applications. In this paper, we propose to utilize application-specific information for cache resource allocation to achieve energy saving, including cache bypassing, the mini-cache and wa...
As Internet grows quickly, pornography, which is often printed into a small quantity of publication in the past, becomes one of the highly distributed information over Internet. However, pornography may be harmful to children, and may affect the efficiency of workers. In this paper, we design an easy scheme for detecting pornography. We exploit pri...