Publications (13)0 Total impact
-
[show abstract]
[hide abstract]
ABSTRACT: Loop fusion is commonly used to improve the instruction-level parallelism of loops for high-performance embedded computing systems. Loop fusion, however, is not always directly applicable because the fusion prevention dependen-cies may exist among loops. Most of the existing techniques still have limitations in fully exploiting the advantages of loop fusion. In this paper, we present a general loop fu-sion technique for loops or nested loops based on the loop dependency graph model, retiming, and multi-dimensional retiming concepts. We show that any "J+K" model loop can be legally fused using our legalizing fusion technique. Polynomial-time algorithms are developed to solve the loop fusion problem for "J+K" model loops considering both tim-ing and code size of the final code. Our technique produces the final code and calculates the resultant code size directly from the retiming values. The experimental results show that our loop fusion technique always significantly reduces the schedule length.
10/2002;
-
[show abstract]
[hide abstract]
ABSTRACT: In real-time digital signal processing (DSP) architec-tures using heterogeneous functional units (FUs), it is crit-ical to select the best FU for each task. However, some tasks may not have fixed execution times. This paper mod-els each varied execution time as a probabilistic random variable and solves heterogeneous assignment with prob-ability (HAP) problem. The solution of the HAP problem assigns a proper FU type to each task such that the to-tal cost is minimized while the timing constraint is satis-fied with a guaranteed confidence probability. The solu-tions to the HAP problem are useful for both hard real-time and soft real-time systems. Two algorithms, one is optimal and the other is heuristic, are proposed to solve the general problem. The experiments show that our algorithms can ef-fectively reduce the total cost with guaranteed confidence probabilities satisfying timing constraints. For example, our algorithms achieve an average reduction of 33.5% on total cost with 90% confidence probability satisfying timing constraints compared with the previous work using worst-case scenario.
-
[show abstract]
[hide abstract]
ABSTRACT: Usually the covering problem requires all elements in a sys-tem to be covered. In some situations, it is very difficult to figure out a solution, or unable to cover all given elements because of resource constraints. In this paper, we study the issue of the partial covering problem. This problem is also referred to the robust k-center problem and can be applied to many fields. The partial covering problem becomes even more harder when we need to determine the subset of the group of all available elements to share resources. Several approximation algorithms are proposed to cover the most elements in this paper. For some real time systems, such as the battlefield communication system, the algorithm pre-sented with polynomial-time complexity can be efficiently applied. The algorithm complexity analysis illustrates the improvement made by our algorithms, which are compared with other papers for the partial covering problem in the literature. The experimental results show that the perfor-mance of our algorithms is much better than other existing 3-approximation algorithm for the robust k-center problem.
-
[show abstract]
[hide abstract]
ABSTRACT: In real-time digital signal processing (DSP) architectures using heterogeneous functional units (FUs), it is critical to select the best FU for each task. However, some tasks may not have fixed execution times. This paper models each var-ied execution time as a probabilistic random variable and solves heterogeneous assignment with probability (HAP) problem. The solutions to the HAP problem are useful for both hard real time and soft real time systems. We propose optimal algorithms for the HAP problem when the input is a tree or a simple path. The experiments show that our algorithms can effectively obtain the optimal solutions to simple paths and trees. For example, with our algorithms, we can obtain an average reduction of 32.5% on total cost with 90% confidence probability compared with the previ-ous work using worst-case scenario.
-
[show abstract]
[hide abstract]
ABSTRACT: Loop fusion is widely used to exploit the instruction-level parallelism by transforming separate loops into one loop for applications of embedded systems. Loop fusion, however, is not always applicable because of the existence of the fusion-prevention dependencies among loops. Therefore, techniques for eliminating the fusion-prevention dependencies are necessary for fully exploiting the benefits of the loop fusion. In this paper we present an efficient loop fusion technique based on loop dependency graph model and multi-dimensional retiming concept. Legalizing fusion theorems are derived for loops to be legally fused. Polynomial-time le-galizing fusion algorithms are developed to solve the loop fusion problems for 1-level loops and 2-level nested loops. Our loop fusion techniques are carefully designed to consider multiple optimization objectives, such as minimizing the code size and the critical path of the fused loop. The resultant code size can be accurately computed. The experimental results show that our loop fusion technique always significantly reduces the schedule length.
-
[show abstract]
[hide abstract]
ABSTRACT: Switching activity and schedule length are the two most important factors that influence the energy con-sumption of an application executed on a VLIW (Very Long Instruction Word) processor. Considering these two factors together, we propose an instruction-level energy-minimization scheduling technique to reduce the energy consumption of applications on VLIW processors. We first formally prove that this problem is NP-complete. Then three heuristic algorithms, MSAS, MLMSA, and EMSA, are proposed. While switching activity and schedule length are given higher priority in MSAS and MLMSA, respectively, EMSA gives the best result considering both of them. The experimental results show that EMSA gives a ¢ ¤ £ ¦ ¥ % reduction in energy compared with the traditional list scheduling approach on average.
-
[show abstract]
[hide abstract]
ABSTRACT: In embedded systems, high performance DSP needs to be performed not only with high data throughout but also with low power consumption. It becomes an important problem to reduce the power consumption of a DSP application with the optimization of time performance on VLIW architectures. Loops are usually the most crit-ical sections and consume a significant amount of power and time in DSP applications. Little research work has been conducted about loop scheduling in minimizing timing and switching activities. This paper develops an instruction-level loop scheduling technique to reduce both execution time and bus switching activities for appli-cations with loops on VLIW architectures. We propose an algorithm, SAMLS (Switching-Activity Minimization Loop Scheduling), to minimize both schedule length and switching activities for applications with loops. In the algorithm, we obtain the best schedule from the ones that are generated from an initial schedule by repeatedly rescheduling the nodes with schedule length and switching activities minimization based on rotation scheduling and bipartite matching. The experimental results show that our algorithm can reduce both schedule length and bus switching activities. Compared with the work in [12], SAMLS shows an average 14.4% reduction in schedule length and an average 21.1% reduction in bus switching activities.
-
[show abstract]
[hide abstract]
ABSTRACT: Sensor nodes usually work under dynamic changing, hard-to-predict environments and have limited lifetime. We use a novel adaptive online energy saving (AOES) algorithm to save total energy consumption for heteroge-neous sensor networks. Due to the uncertainties in exe-cution time of some tasks and multiple working mode of each node, this paper models each varied execution time as a probabilistic random variable to save energy by se-lecting the best mode assignment for each node, which is called Mode Assignment with Probability (MAP) prob-lem. We propose an optimal sub-algorithm MAP Opt to minimize the total energy consumption while satisfying the timing constraint with a guaranteed confidence prob-ability. The experimental results show that our approach achieves significant energy saving than previous work.
-
[show abstract]
[hide abstract]
ABSTRACT: The quick construction of the Shortest Path Tree (SPT) is essential to achieve fast routing speed for an interior net-work using link state protocols, such as OSPF and IS-IS. Whenever the network topology changes, the old SPT must be updated. In a network with a large number of nodes, the technology with the whole SPT re-computation by tra-ditional static algorithms is very inefficient. It will take tremendous computation time and make routing table in-stability by unnecessary changes in an existing SPT. In this paper, we propose an improved algorithm for the dynamic SPT update to solve the above problems. The proposed al-gorithm is based on the understanding of the dynamic up-date process to reduce redundancy. Only significant edges that contribute to the construction of the new SPT will be considered. The analysis of the algorithm complexity and experimental results shows that our algorithm is much bet-ter than any others in the literature.
-
[show abstract]
[hide abstract]
ABSTRACT: In adaptive mobile network like battle fields or military applications, the base station configuration may change dy-namically. Mobile nodes communicate via a base station and can move freely. This dynamic change together with the con-straints make the assignment of mobile nodes to base stations difficult. In this paper, we propose the polynomial-time algo-rithms to find a good partitioning of mobile node assignment. We also show that the assignment of mobile nodes to base sta-tions under bandwidth and communication constraints is NP-complete. We propose two graph models to represent mobile node communication requirement as well as base station con-figuration. Our simulation results show that our techniques can efficiently give a good performance close to the exhaus-tive approaches which minimize the frequency of a node that is not covered by a base station in the dynamic environment.
-
[show abstract]
[hide abstract]
ABSTRACT: In a large-scale adaptive mobile wireless network, long-distance message transfer can be routed by introduced mobile servers while nearby mobile units can contact each other through direct constructed communication channels. In order to achieve scalability by direct connection among mobile units and reliability sustained by mobile servers for QoS, a mixed wireless in-frastructure incorporating mobile servers and ad hoc communications is investigated. A cluster of mobile units can be modelled as a mobile node. Both mobile nodes and servers are free of movement under the adaptive mobile wireless networks. In this paper, two graph models are introduced to represent mobile node communication requirements as well as mobile server config-urations. The dynamically changed topology for mobile nodes and mobile servers together with other constraints (e.g., transmission range, bandwidth) make the assignment of mobile nodes to mobile servers difficult. Such assignment can be formalized as the partition problem, which is proved to be NP-complete to attain optimal solutions. Based on the generated two modelling graphs, polynomial-time algorithms are developed to the partitioning problem such that the communication among different mobile nodes are successfully switched by mobile servers. The experimental environment simulates the dynamically modified network topology of a wireless network consisting of roaming mobile nodes. The simulation results show that the proposed techniques yield good assignments with similar performance as those produced by exhaustive approaches.
-
[show abstract]
[hide abstract]
ABSTRACT: Embedded systems have strict timing and code size require-ments. Retiming is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. Traditionally, retiming has been ap-plied at instruction level to reduce cycle period for single loops. While multi-dimensional (MD) retiming can explore the outer loop parallelism, it introduces large overheads in loop index generation and code size due to loop transformation. In this paper, we propose a novel approach, that combines iterational retiming with instructional retiming to satisfy any given timing constraint by achieving full parallelism for iterations in a partition with minimal code size. The experimental results show that combining iterational retiming and instructional retiming, we can achieve 37% code size reduction comparing to applying iteration retiming alone.
-
[show abstract]
[hide abstract]
ABSTRACT: Software pipelining technique is extensively used to explore the instruction level parallelism in loops. However, this performance optimization technique results in code size expansion. For em-bedded systems with very limited on-chip memory resources, the code size becomes one of the most important optimization concerns. This paper presents the fundamental understanding of the relation-ship between code size expansion and software pipelining based on retiming. We propose a general Code-size REDuction technique (CRED) for software-pipelined loops on various kinds of processors. Our CRED algorithms integrate the code size reduction procedure with software pipelining to produce minimal code size for a target schedule length. The experiments on a set of well-known benchmarks show the effectiveness of CRED technique on both reducing the code size of software-pipelined loops and exploring the code size/performance trade-off space.
Institutions
-
2
-
University of Texas at Dallas
-
Department of Computer Science
Dallas,
TX,
USA