Bruce Jacob

Bruce Jacob
University of Maryland Global Campus | UMUC · Electrical & Computer Engineering

Professor

About

166
Publications
29,217
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,676
Citations

Publications

Publications (166)
Preprint
Full-text available
In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviating the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them consider security as the priority for in-storage co...
Preprint
Full-text available
In-storage computing with modern solid-state drives (SSDs) enables developers to offload programs from the host to the SSD. It has been proven to be an effective approach to alleviate the I/O bottleneck. To facilitate in-storage computing, many frameworks have been proposed. However, few of them treat the in-storage security as the first citizen. S...
Article
Many emerging non-volatile memories are compatible with CMOS logic, potentially enabling their integration into a CPU’s die. This article investigates such monolithically integrated CPU–main memory chips. We exploit non-volatile memories employing 3D crosspoint subarrays, such as resistive RAM (ReRAM), and integrate them over the CPU’s last-level c...
Article
Application performance on novel memory systems is typically estimated using a hardware simulator. The simulation is, however, time consuming, which limits the number of design options that can be explored within a practical length of time. Also, although memory simulators are typicallywell validated, current CPU simulators have various shortcoming...
Conference Paper
Cycle accurate DRAM simulations have been the dominating architecture simulation model for DRAM for a long time. Although accurate, its poor simulation speed has not improved for years while a lot of other architecture simulators such as CPU and cache simulators have moved away from cycle-accurate models for better performance. In this paper, we di...
Conference Paper
Cycle-accurate DRAM models are prevalent in today's computer architecture simulations. However, cycle-accurate models by design are time consuming and not scalable. In this paper, we present a statistical approach of DRAM latency modeling. Unlike previous works, our approach converts DRAM latency modeling into a classification problem and employ ma...
Conference Paper
Full-text available
With the anticipated scaling issues of DRAM memory technology and the increased need for higher density and bandwidth, several alternative memory technologies are being explored for the main memory system. One promising candidate is a variation of Resistive Random-Access Memory (ReRAM) which implements the memory bit-cells on Back-End-of-Line (BEOL...
Article
Non-volatile memory, such as resistive RAM (ReRAM), is compatible with standard CMOS logic processes, allowing a sizable main memory system to be integrated into a CPU's die. ReRAM bitcells are fabricated within crosspoint sub-arrays that leave the bulk of transistors underneath the sub-arrays vacant. This permits placing the memory system over the...
Conference Paper
Application performance on novel memory systems is typically estimated using a hardware simulator. The simulation is, however, time consuming, which limits the number of design options that can be explored within a practical length of time. Also, although memory simulators are typicallywell validated, current CPU simulators have various shortcoming...
Article
The approaching end of DRAM scaling and expansion of emerging memory technologies is motivating a lot of research in future memory systems. Novel memory systems are typically explored by hardware simulators that are slow and often have a simplified or obsolete abstraction of the CPU. This study presents PROFET, an analytical model that predicts how...
Conference Paper
Full-text available
This paper presents the notion of a monolithic computer, a future computer architecture in which a CPU and a high-capacity main memory system are integrated in a single die. Such computers will become possible in the near future due to emerging non-volatile memory technology. In particular, we consider using resistive random access memory, or ReRAM...
Conference Paper
The community accepted the need for a detailed simulation of main memory. Currently, the CPU simulators are usually coupled with the cycle-accurate main memory simulators. However, coupling CPU and memory simulators is not a straight-forward task because some pieces of the circuitry between the last level cache and the memory DIMMs could be easily...
Conference Paper
To feed the high degrees of parallelism in modern graphics processors and manycore CPU designs, DRAM manufacturers have created new DRAM architectures that deliver high bandwidth. This paper presents a simulation-based study of the most common forms of DRAM today: DDR3, DDR4, and LPDDR4 SDRAM; GDDR5 SGRAM; and two recent 3D-stacked architectures: H...
Conference Paper
The overhead of DRAM refresh is increasing with each density generation. To help offset some of this overhead, JEDEC designed the modern Auto-Refresh command with a highly optimized architecture internal to the DRAM---an architecture that violates the timing rules external controllers must observe and obey during normal operation. Numerous refresh-...
Conference Paper
Data movement is the limiting factor in modern supercomputing systems, as system performance drops by several orders of magnitude whenever applications need to move data. Therefore, focusing on low latency (e.g., low diameter) networks that also have high bisection bandwidth is critical. We present a cost/performance analysis of a wide range of hig...
Conference Paper
In this paper, we present a novel memory system checkpointing method that very efficiently stores the complete memory state at a given instant in time to a SSD. Our design relies on a modified memory controller that can issue commands directly to the SSD without relying on system software support and SSD controller firmware that is aware of the che...
Conference Paper
In-package DRAM caches are a promising new development that may enable the continued scaling of main memory by facilitating the creation of multi-level memory systems that can effectively utilize dense non-volatile memory technologies. However, determining an appropriate storage scheme for the large amount of meta-data needed by these new caches ha...
Conference Paper
The increasing size of workloads has led to the development of new technologies and architectures that are intended to help address the capacity limitations of DRAM main memories. The proposed solutions fall into two categories: those that re-engineer Flash-based SSDs to further improve storage system performance and those that incorporate non-vola...
Article
Full-text available
We present a system architecture that uses high-efficiency processors as opposed to high-performance processors, NAND aflash as byte-addressable main memory, and high-speed DRAM as a cache front-end for the flash. The main memory system is ainterconnected and presents a unified global address space to the client microprocessors. A single cabinet co...
Article
DRAM cells require periodic refreshing to preserve data. In JEDEC DDRx devices, a refresh operation is performed via an auto-refresh command, which refreshes multiple rows in multiple banks simultaneously. The internal implementation of auto-refresh is completely opaque outside the DRAM --- all the memory controller can do is to instruct the DRAM t...
Article
Current ultra-high-performance computers execute instructions at the rate of roughly 10 PFLOPS (10 quadrillion floating-point operations per second) and dissipate power in the range of 10 MW. The next generation will need to execute instructions at EFLOPS rates-100× as fast as today's-but without dissipating any more power. To achieve this challeng...
Article
Full-text available
Ever-growing application data footprints demand faster main memory with larger capacity. DRAM has been the technology choice for main memory due to its low latency and high density. However, DRAM cells must be refreshed periodically to preserve their content. Refresh operations negatively affect performance and power. Traditionally, the performance...
Conference Paper
As the size and speed of DRAM devices increase, the performance and energy overheads due to refresh become more significant. To reduce refresh penalty we propose techniques referred collectively as “Coordinated Refresh”, in which scheduling of low power modes and refresh commands are coordinated so that most of the required refreshes are issued whe...
Conference Paper
Full-text available
Large last-level caches (L3Cs) are frequently used to bridge the performance and power gap between processor and memory. Although traditional processors implement caches as SRAMs, technologies such as STT-RAM (MRAM), and eDRAM have been used and/or considered for the implementation of L3Cs. Each of these technologies has inherent weaknesses: SRAM i...
Article
Full-text available
The design and implementation of the commodity memory architecture has resulted in significant performance and capacity limitations. To circumvent these limitations, designers and vendors have begun to place intermediate logic between the CPU and DRAM. This additional logic has two functions: to control the DRAM and to communicate with the CPU over...
Article
The design and implementation of the commodity memory architecture has resulted in significant performance and capacity limitations. To circumvent these limitations, designers and vendors have begun to place intermediate logic between the CPU and DRAM. This additional logic has two functions: to control the DRAM and to communicate with the CPU over...
Conference Paper
Full-text available
This paper presents a cached DIMM architecture - a low-latency and energy-efficient memory system. Two techniques are proposed: the on-DIMM cache and the on-DIMM cache-aware address mapping scheme. These two techniques work together to reduce the memory access latency. Based on the benchmarks considered, our experiments show that compared to a conv...
Article
Full-text available
Isr develops, applies and teaches advanced methodologies of design and analysis to solve complex, hierarchical, heterogeneous and dynamic problems of engineering technology and systems for industry and government. Isr is a permanent institute of the university of maryland, within the a. James clark school of engineering. It is a graduated national...
Conference Paper
Full-text available
Phase change memory (PCM) features nonvolatility, high density, and superior power efficiency, making it one of the most promising candidates for future memory systems. This paper studies the impact of process variations on PCM based on a fast analytical model for determining PCM failure probability. The proposed analytical model takes PCM physical...
Article
Full-text available
In this paper we present DRAMSim2, a cycle accurate memory system simulator. The goal of DRAMSim2 is to be an accurate and publicly available DDR2/3 memory system model which can be used in both full system and trace-based simulations. We describe the process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove th...
Article
Full-text available
As supercomputers grow, understanding their behavior and performance has become increasingly challenging. New hurdles in scalability, programmability, power consumption, reliability, cost, and cooling are emerging, along with new technologies such as 3D integration, GP-GPUs, silicon-photonics, and other "game changers". Currently, they HPC communit...
Article
Energy consumption is the fundamental barrier to exascale supercomputing and it is dominated by the cost of moving data from one point to another, not computation. Similarly, performance is dominated by data movement, not computation. The solution to this problem requires three critical technologies: 3D integration, optical chip-to-chip communicati...
Article
Full-text available
This DRAM architecture optimization, which appears transparent to the memory controller, significantly reduces power consumption. With trivial additional logic, using the posted-CAS command enables a finer-grained selection when activating a portion of the DRAM array. Experiments show that, in a high-use memory system, this approach can reduce tota...
Book
Full-text available
Today, computer-system optimization, at both the hardware and software levels, must consider the details of the memory system in its analysis; failing to do so yields systems that are increasingly inefficient as those systems become more complex. This lecture seeks to introduce the reader to the most important details of the memory system; it targe...
Conference Paper
As their prices decline, their storage capacities increase, and their endurance improves, NAND Flash Solid State Disks (SSD) provide an increasingly attractive alternative to Hard Disk Drives (HDD) for portable computing systems and PCs. This paper presents a study of NAND Flash SSD architectures and their management techniques, quantifying SSD per...
Chapter
Multiple DRAM devices are interconnected together to form a single memory system that is managed by a single memory controller. This chapter describes the basic terminologies and building blocks of DRAM memory systems. It examines the construction, organization, and operation of multiple DRAM devices in a larger memory system. It covers the termino...
Chapter
The physical components of a disk drive are used for recording/retrieving data bits from the magnetic media. This chapter explores some of the data placement alternatives and evaluates some design trade-offs. The retrieving of related data, which is in close physical proximity, is commonly referred to in the disk community as locality of access. Wh...
Article
Full-text available
Accurate and fast system modeling is central to the rapid design space exploration needed for embedded-system design. With fast, complex SoCs playing a central role in such systems, system designers have come to require MIPS-range simulation speeds and near-cycle accuracy. The sophisticated simulation frameworks that have been developed for high-sp...
Book
Is your memory hierarchy stopping your microprocessor from performing at the high level it should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem. The book tells you everything you need to know about the logical design and operation, physical design and operation, performance characteristics and resulting design trade-of...
Article
Accurate and fast system modeling is central to the rapid design space exploration needed for embedded-system design. With fast, complex SoCs playing a central role in such systems, system designers have come to require MIPS-range simulation speeds and near-cycle accuracy. The so- phisticated simulation frameworks that have been developed for high-...
Conference Paper
Full-text available
Performance gains in memory have traditionally been obtained by increasing memory bus widths and speeds. The diminishing returns of such techniques have led to the proposal of an alternate architecture, the Fully-Buffered DIMM. This new standard replaces the conventional memory bus with a narrow, high-speed interface between the memory controller a...
Conference Paper
As transistors continue to scale down into the nanometer regime, device leakage currents are becoming the dominant cause of power dissipation in nanometer caches, making it essential to model these leakage effects properly. Moreover, typical microprocessor caches are pipelined to keep up with the speed of the processor, and the effects of pipelinin...