Conference Paper

Towards Data Tiling for Whole Programs in Scratchpad Memory Allocation.

DOI: 10.1007/978-3-540-74309-5_8 Conference: Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Seoul, Korea, August 23-25, 2007, Proceedings
Source: DBLP

ABSTRACT Data tiling is an array layout transformation technique that partitions an array into smaller subarray blocks. It was originally
proposed to improve the cache performance of regular loops. Recently, researchers have applied this technique to scratchpad
memory (SPM) allocation. Arrays whose sizes exceed a given SPM size can be tiled or divided into smaller subarray blocks or
tiles and the program performance can be significantly improved by placing the smaller subarray tiles in SPM. Existing data
tiling techniques are applicable to regularly-accessed arrays in individual loop nests. In embedded applications, arrays are
often accessed in multiple loop nests via possibly aliased pointers. Tiling arrays in a loop nest alone will often affect
the tiling and allocation decisions for arrays accessed in other loop nests. Moreover, tiling arrays accessed via aliased
pointers is difficult since their access patterns are unknown at compile time. This paper presents a new data tiling approach
to address these practical issues. We perform alias profiling to detect the most likely memory access patterns and use an
ILP solver to select the best tiling schemes for all loop nests in the program as a whole. We have integrated data tiling
in an existing SPM allocation framework. Our preliminary experimental results show that our approach can improve significantly
the performance of a set of programs selected from the Mediabench suite.

0 Bookmarks
 · 
36 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose an effective data pipelining technique, SPDP (scratch-pad data pipelining), for dynamic scratch-pad memory (SPM)management with DMA (Direct Memory Access). InSPDP, we group multiple iterations of a loop into a block for SPM allocation, and implement a data pipeline by overlapping the execution of CPU instructions and DMA operations. We have implemented our SPDP technique into the IMPACT compiler,and conduct experiments using a set of benchmarks from DSP stone, Mibench and Mediabench on the cycle-accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves significant performance improvement compared with the previous work.
    Concurrency and Computation Practice and Experience 01/2010; 22:1874-1892. · 0.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing scratchpad memory (SPM) allocation algorithms for arrays, whether they rely on well-crafted heuristics or resort to integer linear programming (ILP) techniques, typically assume that every array is small enough to fit directly into the SPM. As a result, some arrays have to be spilled entirely to the off-chip memory in order to make room for other arrays to stay in the SPM, resulting in sometimes poor SPM utilization. In this paper, we introduce a new comparability graph coloring allocator that integrates for the first time data tiling and SPM allocation for arrays by tiling arrays on-demand to improve utilization of the SPM. The novelty lies in repeatedly identifying the heaviest path in an array interference graph and then reducing its weight by tiling certain arrays on the path appropriately with respect to the size of the SPM. The effectiveness of our allocator, which is presently restricted to tiling 1-D arrays, is validated by using a number of selected benchmarks for which existing allocators are ineffective.
    Proceedings of the 2010 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2010, Scottsdale, AZ, USA, October 24-29, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: cratch-pad memory (SPM) is widely used in embedded systems. It is a topical and crucial subject to reduce power consumption for SPM systems, since high power consumption can reduce systems reliability and increase the cost and size of heat sinks. In this paper, we propose an effective approach of power reducing to scale down voltage and frequency as much as possible. We first pipelined data transference and processing. Second, we find the comparative time slack between fast data processing and low data transference, and then provide both single and dynamic scaling to reduce power consumption. We conduct our approach on the simulator of Trimaran, and the experimental results show that the approach achieves significant power reduction improvement while the run-time performance outperforms previous work.
    2011 International Conference on Parallel Processing Workshops, ICPPW 2011, Taipei, Taiwan, Sept. 13-16, 2011; 01/2011