Address code optimization using code scheduling for digital signal processors
ABSTRACT We propose an effective address code generation algorithm for digital signal processor (DSP) to minimize the number of addressing instructions. Unlike previous works in which code scheduling and offset (address) assignment are performed sequentially without any interaction between them, our work tightly couples code scheduling with offset assignment to exploit scheduling on optimizing addressing instructions more effectively. We accomplish this by proposing a new code scheduling algorithm that leads to an efficient sequence of variable accesses, minimizing addressing instructions. Experimental results with benchmark DSP programs show average improvements of 23.7% and 47.1% in the address code size and a naive storage assignment algorithm, respectively.
- SourceAvailable from: psu.edu[Show abstract] [Hide abstract]
ABSTRACT: Random access to local variables stored in a stack frame is more difficult for compiled functions when the target processor lacks register-plus-offset addressing. One alternative technique employs a roving pointer which the program increments or decrements as needed between stack accesses. Processors which support auto-increment and auto-decrement addressing modes are often capable of performing these increments for free when consecutive accesses are to adjacent stack locations. For these processors, the compiler's chosen ordering for the local variables in the stack frame can substantially affect the execution speed of the compiled program.This paper is concerned with finding an ordering for the local variables in the frame that maximizes the likelihood that two consecutive references at run-time will be to the same or to adjacent stack locations. We have formulated a solution to this problem in terms of finding a Hamiltonian path in a graph. Although this problem is NP-complete in general, we have developed a heuristic algorithm that delivers good results with acceptable performance.Software Practice and Experience 02/1992; 22(2):101 - 110. DOI:10.1002/spe.4380220202 · 1.15 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Over the last decade, significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been conducted in the context of generalpurpose computing, and more specifically the SPEC benchmark suite. At the same time, a number of microprocessor architectures have emerged which have VLIW and SIMD structures that are well matched to the needs of the ILP compilers. Most of these processors are targeted at embedded applications such as multimedia and communications, rather than general-purpose systems. Conventional wisdom, and a history of hand optimization of inner-loops, suggests that ILP compilation techniques are well suited to these applications. Unfortunately, there currently exists a gap between the compiler community and embedded applications developers. This paper presents MediaBench, a benchmark suite that has been designed to fill this gap. This suite has been constructed through a ...11/1997; DOI:10.1109/MICRO.1997.645830
Conference Paper: Storage Assignment to Decrease Code Size[Show abstract] [Hide abstract]
ABSTRACT: DSP architectures typically provide indirect addressing modes with autoincrement and decrement. In addition, indexing mode is generally not available, and there are usually few, if any, general-purpose registers. Hence, it is necessary to use address registers and perform address arithmetic to access automatic variables. Subsuming the address arithmetic into autoincrement and decrement modes improves the size of the generated code. In this article we present a formulation of the problem of optimal storage assignment such that explicit instructions for address arithmetic are minimized. We prove that for the case of a single address register the decision problem is NP-complete, even for a single basic block. We then generalize the problem to multiple address registers. For both cases heuristic algorithms are given, and experimental results are presented.06/1995