Fig 3 - uploaded by Vladimir Stankovic
Content may be subject to copyright.
Source publication
Performances of DRAM memories are characterized by memory latency and bandwidth. Contemporary DRAM memories more successfully satisfy demands for higher bandwidth than lower latency. In this paper solutions, which may reduce latency of these memories, are investigated. These solutions are two new controller policies called 'Write-miss Only Close-Pa...
Context in source publication
Context 1
... for 'rgrbc' policy. Namely, it could happen that the tags of the L2 conflicting addresses differ in the higher parts, and not the lower ones. In that case 'rgrbc' policy will not give improvement, the row buffer miss will not be resolved. However, we could now apply the xor operation for changing the 'Bank' field. This process is shown in Fig. 3. Now if tags of the missed data block and the data block concurrent differ in their lower bits, then concurrent addresses belong to different groups. If the tags differ in their higher bits, after the xor operation they belong to different banks. We have named this policy 'rgrbcx', since it is derived from 'rgrbc', and 'x' stands for ...
Citations
... However if there is an entry for the accessed row it will be closed after the expected number of accesses suggested by the history table. More techniques using access-based page closure prediction can be found in [20,21,22,23,24,25,26]. ...
Memory controllers have used static page closure policies to decide whether a
row should be left open, open-page policy, or closed immediately, close-page
policy, after the row has been accessed. The appropriate choice for a
particular access can reduce the average memory latency. However, since
application access patterns change at run time, static page policies cannot
guarantee to deliver optimum execution time. Hybrid page policies have been
investigated as a means of covering these dynamic scenarios and are now
implemented in state-of-the-art processors. Hybrid page policies switch between
open-page and close-page policies while the application is running, by
monitoring the access pattern of row hits/conflicts and predicting future
behavior. Unfortunately, as the size of DRAM memory increases, fine-grain
tracking and analysis of memory access patterns does not remain practical. We
propose a compact memory address-based encoding technique which can improve or
maintain the performance of DRAMs page closure predictors while reducing the
hardware overhead in comparison with state-of-the-art techniques. As a case
study, we integrate our technique, HAPPY, with a state-of-the-art monitor, the
Intel-adaptive open-page policy predictor employed by the Intel Xeon X5650, and
a traditional Hybrid page policy. We evaluate them across 70 memory intensive
workload mixes consisting of single-thread and multi-thread applications. The
experimental results show that using the HAPPY encoding applied to the
Intel-adaptive page closure policy can reduce the hardware overhead by 5X for
the evaluated 64 GB memory (up to 40X for a 512 GB memory) while maintaining
the prediction accuracy.
... There are existing address mapping techniques from previous studies [69,58,66,21,77,72]. ...
... Page interleaving, also known as page mode interleaving, is one commonly used address mapping technique [69,66]. Early SDRAM devices were slow, so it was necessary to interleave data across several SDRAM banks to obtain adequate bandwidth for the processor [14]. ...
As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency.
The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance.
The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality.
Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.
... This makes that we can influence DRAM memory performances (latency) by controlling the data placement into banks and rows. This is the basis of papers in which address remappings are considered, which transform memory addresses into banks, rows and columns that optimize DRAM performances for certain memory access patterns [4], [5]. ...
... The Open Row Policy gives good results if there is a good memory access locality, and the Close Row Autoprecharge Policy gives good results if DRAM accesses have mostly random character. In our earlier papers [5], [6] we have already considered various possibilities of obtaining hybrid policies, which use the advantages of both policies. The goal is to achieve a policy more efficient than the both basic policies, and in that way, to decrease the DRAM latency. ...
... We have simulated executions of 6 benchmark programs from the SPEC95 suite: cc1, compress, ijpeg, li, m88ksim, and perl. The characteristics of those programs can be found in [5], [6]. ...
Better insight of programs behavior can help in overcoming large speed difference of central processors and main memories implemented with DRAM chips. It allows us to predict required next actions, based on observed main memory access patterns, which can hide some time components in accessing DRAM memory. Authors of this paper proposed a simple dead time predictor which helps in predicting when to close the opened DRAM row. In this paper this predictor is further improved by adding a zero live time predictor. The zero live time predictor, by its essence, completes the dead time predictor
... Several studies have examined more involved address mappings with the objective of reducing SDRAM row conflicts [3,16,18,19]. Wei-fen Lin [18] pointed out that address mappings affect performance significantly and proposed an address mapping scheme that XORs the device and bank index with the lower bits of the row address. This mapping retains the contiguous-address striping properties across banks at the granularity of a SDRAM row and distributes the blocks that map to a given cache set evenly across the banks. ...
The performance contributions of SDRAM address mapping techniques in the main memory of an embedded system are studied and examined. While spatial locality existing in the access stream increases SDRAM row hit rate, it also increases row conflicts. Mapping of the physical address bits into SDRAM column, row, bank and rank index impacts system performance significantly. A novel address mapping scheme, called bit-reversal, is described and experimentally compared against known methods. The bit- reversal address mapping increases SDRAM row hit rate from 43% to 66% by distributing conflicting memory accesses over independent SDRAM banks. Bit-reversal address mapping reduces the average memory access latency by 26%-29% over other methods, resulting in a 11.7%-13.5% reduction of total execution time. The configuration space of bit-reversal address mapping is explored. Finally, limited studies examining the impact of address mapping techniques in conjunction with SDRAM controller policy and virtual paging illustrate that mapping is better suited to virtual memory free embedded systems than desktop workstations incorporating paging mechanisms.
... Ovo pokazuje da DRAM memorije nisu u strogom smislu memorije sa slučajnim pristupom, koje karakteriše vreme pristupa nezavisno od adresa lokacija kojima se pristupa, već su memorije sa bankama, vrstama i kolonama kao dimenzijama koje određuju vreme pristupa. Pošto vreme pristupa T A može imati tri veličine: t RP +t RA +t CL , t RA + t CL ili samo t CL , istražuju se rešenja koja u pojedinim oblastima aplikacija mogu smanjivati latenciju pristupa ka donjoj granici t CL [1,2]. Aktuelne DDR3 i DDR2 SDRAM memorije imaju vremena t RP , t RA i t CL veličina 12÷18 ns. ...
Sadržaj – Pristupi dinamičkoj (SDRAM) memoriji uključuju tri aktivnosti: pretpunjenje, aktiviranje (Abstract – Dynamic memory accesses include three activities: precharge, row activation and column access, each of about 15 ns or more. For a sequence of successive accesses to columns into the same row of a bank, precharge and row activation must precede only the first access in the sequence, and subsequent accesses require only column accesses. One task of SDRAM controller is to minimize latency or to maximize bandwidth during memory accesses. Authors have proposed usage of hardware predictors to minimize latency of SDRAM memory accesses in their previous papers. Predictors dynamically register SDRAM access history, and based on them predict when to close opened row (this excludes precharge time) and which next row to open (this excludes row activation time) before next access into the same bank. In this paper design of hardware for closed row predictor and opened row predictor as elements of SDRAM memory controller is explained on functional blocks level.
Increasing datacenter compute requirements has led to tremendous growth in the cadence of CPU cores on chip-multiprocessors. With large number of threads running on a single node, it is critical to achieve high memory bandwidth efficiency on large scale CMPs to support continued growth in the number CPU cores. In this paper, we present several mechanisms that improve the memory efficiency by improving the page hit rate for multi-core processors. In particular, we present memory page-policies that dynamically adapt to the runtime workload characteristics and use thread awareness to reduce contention between different memory address streams from the different threads. Unlike contemporary DRAM page policies such as static or timer-based, the proposed framework profiles the memory stream at runtime and uncovers opportunities to close or keep DRAM pages open, resulting in reduced page-conflicts and improved efficiencies. We implement the proposed policies in a cycle-accurate performance model simulating an 8-core processor. Our results show that the proposed adaptive page policies increase performance of high memory bandwidth workloads in SPECint2006 by up to 3%, and can attain 83% average performance relative to a “perfect” page prediction policy. We further show that the performance improvement from the techniques increases with the number of cores and with making the policies thread-aware in a many-core processor. The implementation cost of our techniques is extremely low, an area overhead of only 69 bits, making them extremely attractive for real-life products.
Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical.
We propose a compact memory address-based encoding technique which can improve or maintain the performance of DRAMs page closure predictors while reducing the hardware overhead in comparison with state-of-the-art techniques. As a case study, we integrate our technique, HAPPY, with a state-of-the-art Intel-adaptive monitor (e.g. part of the Intel Xeon X5650) and a traditional Hybrid page policy. We evaluate them across 70 memory intensive workload mixes consisting of single-thread and multi-thread applications. The experimental results show that using the HAPPY encoding applied to the Intel-adaptive page closure policy can reduce the hardware overhead by 5x for the evaluated 64 GB memory (up to 40× for a 512 GB memory) while maintaining the prediction accuracy.
In the arsenal of solutions for computer memory system performance improvement, predictors have gained an increasing role in the past years. They enable hiding the latencies when accessing cache or main memory. Recently the technique of using temporal parameters of cache memory accesses and tag patterns observing has been applied by some authors for prediction of data prefetching. In this paper a possibility of applying analog techniques on controlling DRAM rows opening/closing, is being researched. Obtained results confirm such a possibility, in a form of a complete predictor, which predicts not only when to close the currently open row but also which is the next row to be opened. Using such a predictor can decrease the average DRAM latency, which is very important in many areas, including telecommunications