Memory Systems and Interconnects for Scale-Out Servers

To read the full-text of this research, you can request a copy directly from the author.


The information revolution of the last decade has been fueled by the digitization of almost all human activities through a wide range of Internet services. The backbone of this information age are scale-out datacenters that need to collect, store, and process massive amounts of data. These datacenters distribute vast datasets across a large number of servers, typically into memory-resident shards so as to maintain strict quality-of-service guarantees. While data is driving the skyrocketing demands for scale-out servers, processor and memory manufacturers have reached fundamental efficiency limits, no longer able to increase server energy efficiency at a sufficient pace. As a result, energy has emerged as the main obstacle to the scalability of information technology (IT) with huge economic implications. Delivering sustainable IT calls for a paradigm shift in computer system design. As memory has taken a central role in IT infrastructure, memory-centric architectures are required to fully utilize the IT's costly memory investment. In response, processor architects are resorting to manycore architectures to leverage the abundant request-level parallelism found in data-centric applications. Manycore processors fully utilize available memory resources, thereby increasing IT efficiency by almost an order of magnitude. Because manycore server chips execute a large number of concurrent requests, they exhibit high incidence of accesses to the last-level-cache for fetching instructions (due to large instruction footprints), and off-chip memory (due to lack of temporal reuse in on-chip caches) for accessing dataset objects. As a result, on-chip interconnects and the memory system are emerging as major performance and energy-efficiency bottlenecks in servers. This thesis seeks to architect on-chip interconnects and memory systems that are tuned for the requirements of memory-centric scale-out servers. By studying a wide range of data-centric applications, we uncover application phenomena common in data-centric applications, and examine their implications on on-chip network and off-chip memory traffic. Finally, we propose specialized on-chip interconnects and memory systems that leverage common traffic characteristics, thereby improving server throughput and energy efficiency.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... They play a pivotal role in ensuring the performance and power scalability of several manycore chips as they provide the path to performance-critical LLC resident instructions. Communication power dissipation is emerging as a significant fraction of the total chip power budget [2], [3]. In order to meet the high data rates and low energy-perbit requirements, as projected by the International Technology Roadmap for Semiconductor (ITRS), there is a need to minimize total losses comprising of conductor and dielectric losses at current and future technology nodes [4], [5]. ...
Full-text available
In planar on-chip copper interconnects, conductor losses due to surface roughness demands explicit consideration for accurate modeling of their performance metrics. This is quite pertinent for high-performance manycore processors/servers, where on-chip interconnects are increasingly emerging as one of the key performance bottlenecks. This paper presents a novel analytical model for parameter extraction in current and future on-chip interconnects. Our proposed model aids in analyzing the impact of spatial and vertical surface roughness on their electrical performance. Our analysis clearly depicts that as the technology nodes scale down; the effect of the surface roughness becomes dominant and cannot be ignored. Based on AFM images of fabricated ultra-thin copper sheets, we have extracted roughness parameters to define realistic surface profiles using the well-known Mandelbrot-Weierstrass (MW) fractal function. For our analysis, we have considered four current and future interconnect technology nodes (i.e. 45nm, 22nm, 13nm, 7nm) and evaluated the impact of surface roughness on typical performance metrics, such as delay, energy and bandwidth. Results obtained using our model are verified by comparing with industry standard field solver Ansys HFSS as well as available experimental data that exhibits accuracy within 9%. We present signal integrity analysis using the eye diagram at 1Gbps, 5Gbps, 10Gbps and 18Gbps bit rates to find the increase in frequency dependent losses due to surface roughness. Finally, simulating a standard three line on-chip interconnect structure, we also report the computational overhead incurred for different values of roughness and technology nodes.
Full-text available
This paper presents an overview of the problem of surface roughness in ultra-scaled Copper (Cu) interconnects. It is seen that surface roughness can severely degrade the electrical and thermal performance of Cu interconnects. This penalty has largely been ignored that has resulted in fairly optimistic models and estimates. It is in this context that this paper and our ongoing work gains significance. The authors make an attempt to present the big picture with reference to interconnect surface roughness and its implications on various design metrics.
ResearchGate has not been able to resolve any references for this publication.